Re: how to get all different values for each key
hi,harsh After map, I can get all values for one key, but I want dedup these values, only get all unique values. now I just do it like the image. I think the following code is not efficient.(using a HashSet to dedup) Thanks:) private static class MyReducer extends ReducerLongWritable,LongWritable,LongWritable,LongsWritable { HashSetLong uids=new HashSetLong(); LongsWritable unique_uids=new LongsWritable(); public void reduce(LongWritable key,IterableLongWritable values,Context context)throws IOException,InterruptedException { uids.clear(); for(LongWritable v:values) { uids.add(v.get()); } int size=uids.size(); long[] l=new long[size]; int i=0; for(long uid:uids) { l[i]=uid; i++; } unique_uids.Set(l); context.write(key,unique_uids); } } 2011/8/3 Harsh J ha...@cloudera.com Use MapReduce :) If map output: (key, value) Then reduce input becomes: (key, [iterator of values across all maps with (key, value)]) I believe this is very similar to the wordcount example, but minus the summing. For a given key, you get all the values that carry that key in the reducer. Have you tried to run a simple program to achieve this before asking? Or is something specifically not working? On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote: HI, I hava many key,value pairs now, and want to get all different values for each key, which way is efficient for this work. such as input : 1,2 1,3 1,4 1,3 2,1 2,2 output: 1,2/3/4 2,1/2 Thanks! walter -- Harsh J
Re:Re:Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition
I understand now. And looks like the job will print the min value instead of max value per my test. In the stdout I can see the following data: 3 is the year (I fake the data by myself), 99 is the max, and 0 is the min. We can see for year 3, there are 100 records. So the inside a group, the key could be different, and context.write(key, NullWritable.get()) will write the LAST key to the output, since the temperature is order desc, so the last key has the min temperature 3 99 3 0 number of records for this group 100 -biggest key is-- 3 0 public void reduce(IntPair key, IterableNullWritable values, Context context ) throws IOException, InterruptedException { int count=0; for (NullWritable iw:values) { count++; System.out.print(key.getFirst()); System.out.print(' '); System.out.println(key.getSecond()); } System.out.println(number of records for this group +Integer.toString(count)); System.out.println(-biggest key is--); System.out.print(key.getFirst()); System.out.print(' '); System.out.println(key.getSecond()); context.write(key, NullWritable.get()); } At 2011-08-03 11:41:23,Daniel,Wu hadoop...@163.com wrote: or I should ask, should the input of the reducer for the group of year 1900 be like key, value pair (1900,35), null (1900,34),null (1900,33),null or like (1900,35), null (1900,35), null== since (1900,34) is for the same group as (1900,35), so it use (1900,35) as the key. (1900,35), null At 2011-08-03 10:35:51,Daniel,Wu hadoop...@163.com wrote: So the key of a group is determined by the first coming record in the group, if we have 3 records in a group 1: (1900,35) 2:(1900,34) 3:(1900,33) if (1900,35) comes in as the first row, then the result key will be (1900,35), when the second row (1900,34) comes in, it won't the impact the key of the group, meaning it will not overwrite the key (1900,35) to (1900,34), correct. in the KeyComparator, these are guaranteed to come in reverse order in the second slot. That is, if 35 is the maximum temperature then (1900,35) will come before ANY other (1900,t). Then as the GroupComparator does its thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO (1900,35), and thus its (null) value is added to the (1900,35) group. The reducer then gets a (1900,35) key with an Iterable of null values, which it pretty much discards and just emits the key, which contains the maximum value.
YCSB Benchmarking for HBase
Hi, Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ?? I am trying to run it, its giving me error: $ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db com.yahoo.ycsb.db.HBaseClient YCSB Command Line client Type help for command line help Start with -help for usage info Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406) at java.lang.Class.getConstructor0(Class.java:2716) at java.lang.Class.newInstance0(Class.java:343) at java.lang.Class.newInstance(Class.java:325) at com.yahoo.ycsb.CommandLine.main(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 6 more By the error, it seems like its not able to get Hadoop-core.jar file, but its already in the class path. Has anyone worked on YCSB with hbase ? Thanks, Praveenesh
Re: how to get all different values for each key
Hey, I feel HashSet is a good method to dedup. To increase the overall efficiency you could also look into Combiner running the same Reducer code. That would ensure less data in the sort-shuffle phase. Regards, Matthew On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote: hi,harsh After map, I can get all values for one key, but I want dedup these values, only get all unique values. now I just do it like the image. I think the following code is not efficient.(using a HashSet to dedup) Thanks:) private static class MyReducer extends ReducerLongWritable,LongWritable,LongWritable,LongsWritable { HashSetLong uids=new HashSetLong(); LongsWritable unique_uids=new LongsWritable(); public void reduce(LongWritable key,IterableLongWritable values,Context context)throws IOException,InterruptedException { uids.clear(); for(LongWritable v:values) { uids.add(v.get()); } int size=uids.size(); long[] l=new long[size]; int i=0; for(long uid:uids) { l[i]=uid; i++; } unique_uids.Set(l); context.write(key,unique_uids); } } 2011/8/3 Harsh J ha...@cloudera.com Use MapReduce :) If map output: (key, value) Then reduce input becomes: (key, [iterator of values across all maps with (key, value)]) I believe this is very similar to the wordcount example, but minus the summing. For a given key, you get all the values that carry that key in the reducer. Have you tried to run a simple program to achieve this before asking? Or is something specifically not working? On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote: HI, I hava many key,value pairs now, and want to get all different values for each key, which way is efficient for this work. such as input : 1,2 1,3 1,4 1,3 2,1 2,2 output: 1,2/3/4 2,1/2 Thanks! walter -- Harsh J
Re: how to get all different values for each key
thanks! Matthew : * * *how about using SecondarySory to get key,values, the values are sorted for every key.* *then traverse the sorted values to get all unique values.* ** * I am not sure which way is more efficient. I doubt HashSet is a complicated data structure. * 2011/8/3 Matthew John tmatthewjohn1...@gmail.com Hey, I feel HashSet is a good method to dedup. To increase the overall efficiency you could also look into Combiner running the same Reducer code. That would ensure less data in the sort-shuffle phase. Regards, Matthew On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote: hi,harsh After map, I can get all values for one key, but I want dedup these values, only get all unique values. now I just do it like the image. I think the following code is not efficient.(using a HashSet to dedup) Thanks:) private static class MyReducer extends ReducerLongWritable,LongWritable,LongWritable,LongsWritable { HashSetLong uids=new HashSetLong(); LongsWritable unique_uids=new LongsWritable(); public void reduce(LongWritable key,IterableLongWritable values,Context context)throws IOException,InterruptedException { uids.clear(); for(LongWritable v:values) { uids.add(v.get()); } int size=uids.size(); long[] l=new long[size]; int i=0; for(long uid:uids) { l[i]=uid; i++; } unique_uids.Set(l); context.write(key,unique_uids); } } 2011/8/3 Harsh J ha...@cloudera.com Use MapReduce :) If map output: (key, value) Then reduce input becomes: (key, [iterator of values across all maps with (key, value)]) I believe this is very similar to the wordcount example, but minus the summing. For a given key, you get all the values that carry that key in the reducer. Have you tried to run a simple program to achieve this before asking? Or is something specifically not working? On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote: HI, I hava many key,value pairs now, and want to get all different values for each key, which way is efficient for this work. such as input : 1,2 1,3 1,4 1,3 2,1 2,2 output: 1,2/3/4 2,1/2 Thanks! walter -- Harsh J
Re: Re: error:Type mismatch in value from map
It should. Whats the input value class for reducer you are setting in Job? 2011/7/30 Daniel,Wu hadoop...@163.com Thanks Joey, It works, but one place I don't understand: 1: in the map extends MapperText, Text, Text, IntWritable so the output value is of type IntWritable 2: in the reduce extends ReducerText,Text,Text,IntWritable So input value is of type Text. type of map output should be the same as input type of reduce, correct? but here IntWritableText And the code can run without any error, shouldn't it complain type mismatch? At 2011-07-29 22:49:31,Joey Echeverria j...@cloudera.com wrote: If you want to use a combiner, your map has to output the same types as your combiner outputs. In your case, modify your map to look like this: public static class TokenizerMapper extends MapperText, Text, Text, IntWritable{ public void map(Text key, Text value, Context context ) throws IOException, InterruptedException { context.write(key, new IntWritable(1)); } } 11/07/29 22:22:22 INFO mapred.JobClient: Task Id : attempt_201107292131_0011_m_00_2, Status : FAILED java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text But I already set IntWritable in 2 places, 1: ReducerText,Text,Text,IntWritable 2:job.setOutputValueClass(IntWritable.class); So where am I wrong? public class MyTest { public static class TokenizerMapper extends MapperText, Text, Text, Text{ public void map(Text key, Text value, Context context ) throws IOException, InterruptedException { context.write(key, value); } } public static class IntSumReducer extends ReducerText,Text,Text,IntWritable { public void reduce(Text key, IterableText values, Context context ) throws IOException, InterruptedException { int count = 0; for (Text iw:values) { count++; } context.write(key, new IntWritable(count)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // the configure of seprator should be done in conf conf.set(key.value.separator.in.input.line, ,); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println(Usage: wordcount in out); System.exit(2); } Job job = new Job(conf, word count); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); //job.setReducerClass(IntSumReducer.class); job.setInputFormatClass(KeyValueTextInputFormat.class); // job.set(key.value.separator.in.input.line, ,); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Join me at http://hadoopworkshop.eventbrite.com/
Re: Re: error:Type mismatch in value from map
Sorry for earlier reply . Is your combiner outputting the Text,Text key/value pairs? On Wed, Aug 3, 2011 at 5:26 PM, madhu phatak phatak@gmail.com wrote: It should. Whats the input value class for reducer you are setting in Job? 2011/7/30 Daniel,Wu hadoop...@163.com Thanks Joey, It works, but one place I don't understand: 1: in the map extends MapperText, Text, Text, IntWritable so the output value is of type IntWritable 2: in the reduce extends ReducerText,Text,Text,IntWritable So input value is of type Text. type of map output should be the same as input type of reduce, correct? but here IntWritableText And the code can run without any error, shouldn't it complain type mismatch? At 2011-07-29 22:49:31,Joey Echeverria j...@cloudera.com wrote: If you want to use a combiner, your map has to output the same types as your combiner outputs. In your case, modify your map to look like this: public static class TokenizerMapper extends MapperText, Text, Text, IntWritable{ public void map(Text key, Text value, Context context ) throws IOException, InterruptedException { context.write(key, new IntWritable(1)); } } 11/07/29 22:22:22 INFO mapred.JobClient: Task Id : attempt_201107292131_0011_m_00_2, Status : FAILED java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text But I already set IntWritable in 2 places, 1: ReducerText,Text,Text,IntWritable 2:job.setOutputValueClass(IntWritable.class); So where am I wrong? public class MyTest { public static class TokenizerMapper extends MapperText, Text, Text, Text{ public void map(Text key, Text value, Context context ) throws IOException, InterruptedException { context.write(key, value); } } public static class IntSumReducer extends ReducerText,Text,Text,IntWritable { public void reduce(Text key, IterableText values, Context context ) throws IOException, InterruptedException { int count = 0; for (Text iw:values) { count++; } context.write(key, new IntWritable(count)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); // the configure of seprator should be done in conf conf.set(key.value.separator.in.input.line, ,); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length != 2) { System.err.println(Usage: wordcount in out); System.exit(2); } Job job = new Job(conf, word count); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); //job.setReducerClass(IntSumReducer.class); job.setInputFormatClass(KeyValueTextInputFormat.class); // job.set(key.value.separator.in.input.line, ,); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } -- Joseph Echeverria Cloudera, Inc. 443.305.9434 -- Join me at http://hadoopworkshop.eventbrite.com/ -- Join me at http://hadoopworkshop.eventbrite.com/
Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), Daniel,Wu hadoop...@163.com wrote: So the key of a group is determined by the first coming record in the group, if we have 3 records in a group 1: (1900,35) 2:(1900,34) 3:(1900,33) if (1900,35) comes in as the first row, then the result key will be (1900,35), when the second row (1900,34) comes in, it won't the impact the key of the group, meaning it will not overwrite the key (1900,35) to (1900,34), correct. Effectively, yes. Remember that on the inside it's using the comparator something like this: (1900, 35).. do I have that key already? [searches collection of keys with, say, a BST] no! I'll add it here. (1900,34).. do I have that key already? [searches again, now getting a result of 0 when comparing to (1900,35)] yes! [it's not the same key, but according to the GroupComparator it is!] so I'll add its value to the key's iterable of values. etc.
Re: ivy download error while building mumak
May be maven is not able to connect to central repository because of proxy. On Fri, Jul 29, 2011 at 2:54 PM, Arun K arunk...@gmail.com wrote: Hi all ! I have downloaded hadoop-0.21.I am behind my college proxy. I get the following error while building mumak : $cd /home/arun/Documents/hadoop-0.21.0/mapred $ant package Buildfile: build.xml clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: ivy-download: [get] Getting: http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy- 2.1.0.jar [get] To: /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar [get] Error getting http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar Any help ? Thanks, Arun K -- Join me at http://hadoopworkshop.eventbrite.com/
TotalOrderPartitioner with new api - help
Good evening, I would like to ask you a question regarding the use of TotalOrderPartitioner. I am working on my diploma thesis, and I need to use the TotalOrderPartitioner (with the InputSampler of course), under Hadoop 0.20.2 In order to use it, I need to apply the patch (https://issues.apache.org/jira/browse/MAPREDUCE-366), but it fails for some reason. If I am correct, the patch modifies the TotalOrderPartitioner InputSampler classes in the org.apache.hadoop.mapred.lib package, in order to deprecate them and then it specifies 2 new classes to be used: TotalOrderPartitioner InputSampler in org.apache.hadoop.mapreduce.lib.partitioner, using the new API. I would like to ask, if someone has successfully applied the patch. Could he send me the new classes (TotalOrderPartitioner and InputSampler) from their hadoop installation, after the patch is applied? (it affects the 2 classes both in org.apache.hadoop.mapred.lib and org.apache.hadoop.mapreduce.lib.partitioner packages). Or at least you could suggest another solution? I hope this will not consume your time. I apologize for the inconvenience, but I need these two classes in order to finish my diploma thesis, and I don't know from who I should ask for help. Thank you very much in advance, Sofia Georgiakaki undergraduate student department of Electronic Computer Engineering Technical University of Crete, Greece
Re: how to get all different values for each key
Secondary sort is the way to go. Easier to dedup a sorted input set. Although you can also try to filter in map and combine phases to a safe extent possible (sets, etc.), to speed up the process and reduce data transfers. On Wed, Aug 3, 2011 at 4:07 PM, Jianxin Wang wangjx...@gmail.com wrote: thanks! Matthew : * * * how about using SecondarySory to get key,values, the values are sorted for every key.* *then traverse the sorted values to get all unique values.* * * * I am not sure which way is more efficient. I doubt HashSet is a complicated data structure. * 2011/8/3 Matthew John tmatthewjohn1...@gmail.com Hey, I feel HashSet is a good method to dedup. To increase the overall efficiency you could also look into Combiner running the same Reducer code. That would ensure less data in the sort-shuffle phase. Regards, Matthew On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote: hi,harsh After map, I can get all values for one key, but I want dedup these values, only get all unique values. now I just do it like the image. I think the following code is not efficient.(using a HashSet to dedup) Thanks:) private static class MyReducer extends ReducerLongWritable,LongWritable,LongWritable,LongsWritable { HashSetLong uids=new HashSetLong(); LongsWritable unique_uids=new LongsWritable(); public void reduce(LongWritable key,IterableLongWritable values,Context context)throws IOException,InterruptedException { uids.clear(); for(LongWritable v:values) { uids.add(v.get()); } int size=uids.size(); long[] l=new long[size]; int i=0; for(long uid:uids) { l[i]=uid; i++; } unique_uids.Set(l); context.write(key,unique_uids); } } 2011/8/3 Harsh J ha...@cloudera.com Use MapReduce :) If map output: (key, value) Then reduce input becomes: (key, [iterator of values across all maps with (key, value)]) I believe this is very similar to the wordcount example, but minus the summing. For a given key, you get all the values that carry that key in the reducer. Have you tried to run a simple program to achieve this before asking? Or is something specifically not working? On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote: HI, I hava many key,value pairs now, and want to get all different values for each key, which way is efficient for this work. such as input : 1,2 1,3 1,4 1,3 2,1 2,2 output: 1,2/3/4 2,1/2 Thanks! walter -- Harsh J -- Harsh J
Re: TotalOrderPartitioner with new api - help
Sofia, I'd recommend using the old (actually, stable) API for development right now, when using 0.20.2. Do not be confused by the deprecation marks since it has been un-deprecated for later releases. Using the stable API should rid you of the trouble of patching the whole thing up. I.e., use JobConf+JobClient+'mapred'-package to build and run jobs instead of the 'Job' class and 'mapreduce' package. On Wed, Aug 3, 2011 at 6:05 PM, Sofia Georgiakaki geosofie_...@yahoo.com wrote: Good evening, I would like to ask you a question regarding the use of TotalOrderPartitioner. I am working on my diploma thesis, and I need to use the TotalOrderPartitioner (with the InputSampler of course), under Hadoop 0.20.2 In order to use it, I need to apply the patch (https://issues.apache.org/jira/browse/MAPREDUCE-366), but it fails for some reason. If I am correct, the patch modifies the TotalOrderPartitioner InputSampler classes in the org.apache.hadoop.mapred.lib package, in order to deprecate them and then it specifies 2 new classes to be used: TotalOrderPartitioner InputSampler in org.apache.hadoop.mapreduce.lib.partitioner, using the new API. I would like to ask, if someone has successfully applied the patch. Could he send me the new classes (TotalOrderPartitioner and InputSampler) from their hadoop installation, after the patch is applied? (it affects the 2 classes both in org.apache.hadoop.mapred.lib and org.apache.hadoop.mapreduce.lib.partitioner packages). Or at least you could suggest another solution? I hope this will not consume your time. I apologize for the inconvenience, but I need these two classes in order to finish my diploma thesis, and I don't know from who I should ask for help. Thank you very much in advance, Sofia Georgiakaki undergraduate student department of Electronic Computer Engineering Technical University of Crete, Greece -- Harsh J
Re: TotalOrderPartitioner with new api - help
Thank you for your reply. This is what the creator of the patch also recommended. The problem is, that I already have developed the project using the new API (I didn't know about the problems), so it won't be so easy to convert the whole job. In addition, I'm nervous wondering if the code will run after these changes... Aren't those classes in the old API deprecated? If I should apply a patch to deprecate them, it would not be a solution, since the code will be tested on the cluster at my university and I could not apply such a patch there, I suppose. In addition, the cluster is possible that it will be updated to Hadoop 0.20.203. Will I have a problem using the old api then?? Hadoop is confusing, I say. Thank you, Sofia Georgiakaki
RE: TotalOrderPartitioner with new api - help
Please unsubscribe me. -Original Message- From: Sofia Georgiakaki [mailto:geosofie_...@yahoo.com] Sent: Wednesday, August 03, 2011 9:42 AM To: common-user@hadoop.apache.org Subject: Re: TotalOrderPartitioner with new api - help Thank you for your reply. This is what the creator of the patch also recommended. The problem is, that I already have developed the project using the new API (I didn't know about the problems), so it won't be so easy to convert the whole job. In addition, I'm nervous wondering if the code will run after these changes... Aren't those classes in the old API deprecated? If I should apply a patch to deprecate them, it would not be a solution, since the code will be tested on the cluster at my university and I could not apply such a patch there, I suppose. In addition, the cluster is possible that it will be updated to Hadoop 0.20.203. Will I have a problem using the old api then?? Hadoop is confusing, I say. Thank you, Sofia Georgiakaki
Re: YCSB Benchmarking for HBase
On Wed, Aug 3, 2011 at 6:10 AM, praveenesh kumar praveen...@gmail.comwrote: Hi, Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ?? I am trying to run it, its giving me error: $ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db com.yahoo.ycsb.db.HBaseClient YCSB Command Line client Type help for command line help Start with -help for usage info Exception in thread main java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406) at java.lang.Class.getConstructor0(Class.java:2716) at java.lang.Class.newInstance0(Class.java:343) at java.lang.Class.newInstance(Class.java:325) at com.yahoo.ycsb.CommandLine.main(Unknown Source) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) ... 6 more By the error, it seems like its not able to get Hadoop-core.jar file, but its already in the class path. Has anyone worked on YCSB with hbase ? Thanks, Praveenesh I just did http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/ycsb_cassandra_0_7_6. For hbase I followed the steps here: http://blog.lars-francke.de/2010/08/16/performance-testing-hbase-using-ycsb/ I also followed the comment in the bottom to make sure the hbase-site.xml was on the classpath. Startup script looks like this: CP=build/ycsb.jar:db/hbase/conf/ for i in db/hbase/lib/* ; do CP=$CP:${i} done #-load load the workload #-t run the workload java -cp $CP com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloadb \
Re: Kill Task Programmatically
Hello, You can just throw run time exception. In that case it will fail :) Regards, Aleksandr --- On Wed, 8/3/11, Adam Shook ash...@clearedgeit.com wrote: From: Adam Shook ash...@clearedgeit.com Subject: Kill Task Programmatically To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Wednesday, August 3, 2011, 3:33 PM Is there any way I can programmatically kill or fail a task, preferably from inside a Mapper or Reducer? At any time during a map or reduce task, I have a use case where I know it won't succeed based solely on the machine it is running on. It is rare, but I would prefer to kill the task and have Hadoop start it up on a different machine as usual instead of waiting for the 10 minute default timeout. I suppose the speculative execution could take care of it, but I would rather not rely on it if I am able to kill it myself. Thanks, Adam
RE: Kill Task Programmatically
Adam, You can use RunningJob.killTask(TaskAttemptID taskId, boolean shouldFail) API to kill the task. Clients can get hold of RunningJob via the JobClient and then use running-job for killing the task etc. Refer API doc : http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Ru nningJob.html#killTask(org.apache.hadoop.mapred.TaskAttemptID, boolean) Devaraj K -Original Message- From: Aleksandr Elbakyan [mailto:ramal...@yahoo.com] Sent: Thursday, August 04, 2011 5:10 AM To: common-user@hadoop.apache.org Subject: Re: Kill Task Programmatically Hello, You can just throw run time exception. In that case it will fail :) Regards, Aleksandr --- On Wed, 8/3/11, Adam Shook ash...@clearedgeit.com wrote: From: Adam Shook ash...@clearedgeit.com Subject: Kill Task Programmatically To: common-user@hadoop.apache.org common-user@hadoop.apache.org Date: Wednesday, August 3, 2011, 3:33 PM Is there any way I can programmatically kill or fail a task, preferably from inside a Mapper or Reducer? At any time during a map or reduce task, I have a use case where I know it won't succeed based solely on the machine it is running on. It is rare, but I would prefer to kill the task and have Hadoop start it up on a different machine as usual instead of waiting for the 10 minute default timeout. I suppose the speculative execution could take care of it, but I would rather not rely on it if I am able to kill it myself. Thanks, Adam