Re: how to get all different values for each key

2011-08-03 Thread Jianxin Wang
hi,harsh
After map, I can get all values for one key, but I want dedup these
values, only get all unique values. now I just do it like the image.

I think the following code is not efficient.(using a HashSet to dedup)
Thanks:)

private static class MyReducer extends
ReducerLongWritable,LongWritable,LongWritable,LongsWritable
{
HashSetLong uids=new HashSetLong();
LongsWritable unique_uids=new LongsWritable();
public void reduce(LongWritable key,IterableLongWritable values,Context
context)throws IOException,InterruptedException
{
uids.clear();
for(LongWritable v:values)
{
uids.add(v.get());
}
int size=uids.size();
long[] l=new long[size];
int i=0;
for(long uid:uids)
{
l[i]=uid;
i++;
}
unique_uids.Set(l);
context.write(key,unique_uids);
}
}


2011/8/3 Harsh J ha...@cloudera.com

 Use MapReduce :)

 If map output: (key, value)
 Then reduce input becomes: (key, [iterator of values across all maps
 with (key, value)])

 I believe this is very similar to the wordcount example, but minus the
 summing. For a given key, you get all the values that carry that key
 in the reducer. Have you tried to run a simple program to achieve this
 before asking? Or is something specifically not working?

 On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote:
  HI,
 I hava many key,value pairs now, and want to get all different
 values
  for each key, which way is efficient for this work.
 
such as input : 1,2 1,3 1,4 1,3 2,1 2,2
output: 1,2/3/4 2,1/2
 
Thanks!
 
  walter
 



 --
 Harsh J



Re:Re:Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-03 Thread Daniel,Wu
I understand now. And looks like the job will print the min value instead of 
max value per my test. In the stdout I can see the following data: 3 is the 
year (I fake the data by myself), 99 is the max, and 0 is the min. We can see 
for year 3, there are 100 records. So the inside a group, the key could be 
different, and 
context.write(key, NullWritable.get()) will write the LAST key to the output, 
since the temperature is order desc, so the last key has the min temperature 

3 99

3 0
number of records for this group 100
-biggest key is--
3 0


public void reduce(IntPair key, IterableNullWritable values, 
   Context context
   ) throws IOException, InterruptedException {
  int count=0;
  for (NullWritable iw:values) {
count++;
System.out.print(key.getFirst());
System.out.print(' ');
System.out.println(key.getSecond());
   }
  System.out.println(number of records for this group 
+Integer.toString(count));
  System.out.println(-biggest key 
is--);
  System.out.print(key.getFirst());
  System.out.print(' ');
  System.out.println(key.getSecond());
  context.write(key, NullWritable.get());
 }




At 2011-08-03 11:41:23,Daniel,Wu hadoop...@163.com wrote:
or I should ask, should the input of the reducer for the group of year 1900 be 
like
key,  value pair
(1900,35), null
(1900,34),null
(1900,33),null


or like
(1900,35), null
(1900,35), null== since (1900,34) is for the same group as (1900,35), so 
it use (1900,35) as the key.
(1900,35), null


At 2011-08-03 10:35:51,Daniel,Wu hadoop...@163.com wrote:

So the key of a group is determined by the first coming record in the group,  
if we have 3 records in a group
1: (1900,35)
2:(1900,34)
3:(1900,33)

if (1900,35) comes in as the first row, then the result key will be 
(1900,35), when the second row (1900,34) comes in, it won't the impact the 
key of the group, meaning it will not overwrite the key (1900,35) to 
(1900,34), correct.

in the KeyComparator, these are guaranteed to come in reverse order in the 
second slot.  That is, if 35 is the maximum temperature then (1900,35) will 
come before ANY other (1900,t).  Then as the GroupComparator does its 
thing, any time (1900,t) comes up it gets compared AND FOUND EQUAL TO 
(1900,35), and thus its (null) value is added to the (1900,35) group.  
The reducer then gets a (1900,35) key with an Iterable of null values, 
which it pretty much discards and just emits the key, which contains the 
maximum value.


YCSB Benchmarking for HBase

2011-08-03 Thread praveenesh kumar
Hi,

Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ??

I am trying to run it, its giving me error:

$ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db
com.yahoo.ycsb.db.HBaseClient

YCSB Command Line client
Type help for command line help
Start with -help for usage info
Exception in thread main java.lang.NoClassDefFoundError:
org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
at java.lang.Class.getConstructor0(Class.java:2716)
at java.lang.Class.newInstance0(Class.java:343)
at java.lang.Class.newInstance(Class.java:325)
at com.yahoo.ycsb.CommandLine.main(Unknown Source)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more

By the error, it seems like its not able to get Hadoop-core.jar file, but
its already in the class path.
Has anyone worked on YCSB with hbase ?

Thanks,
Praveenesh


Re: how to get all different values for each key

2011-08-03 Thread Matthew John
Hey,

I feel HashSet is a good method to dedup. To increase the overall efficiency
you could also look into Combiner running the same Reducer code. That would
ensure less data in the sort-shuffle phase.

Regards,
Matthew

On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote:

 hi,harsh
 After map, I can get all values for one key, but I want dedup these
 values, only get all unique values. now I just do it like the image.

 I think the following code is not efficient.(using a HashSet to dedup)
 Thanks:)

 private static class MyReducer extends
 ReducerLongWritable,LongWritable,LongWritable,LongsWritable
 {
 HashSetLong uids=new HashSetLong();
  LongsWritable unique_uids=new LongsWritable();
 public void reduce(LongWritable key,IterableLongWritable values,Context
 context)throws IOException,InterruptedException
  {
 uids.clear();
 for(LongWritable v:values)
  {
 uids.add(v.get());
 }
  int size=uids.size();
 long[] l=new long[size];
 int i=0;
  for(long uid:uids)
 {
 l[i]=uid;
  i++;
 }
 unique_uids.Set(l);
  context.write(key,unique_uids);
 }
 }


 2011/8/3 Harsh J ha...@cloudera.com

 Use MapReduce :)

 If map output: (key, value)
 Then reduce input becomes: (key, [iterator of values across all maps
 with (key, value)])

 I believe this is very similar to the wordcount example, but minus the
 summing. For a given key, you get all the values that carry that key
 in the reducer. Have you tried to run a simple program to achieve this
 before asking? Or is something specifically not working?

 On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com wrote:
  HI,
 I hava many key,value pairs now, and want to get all different
 values
  for each key, which way is efficient for this work.
 
such as input : 1,2 1,3 1,4 1,3 2,1 2,2
output: 1,2/3/4 2,1/2
 
Thanks!
 
  walter
 



 --
 Harsh J





Re: how to get all different values for each key

2011-08-03 Thread Jianxin Wang
thanks! Matthew :
*
*
*how about using SecondarySory to get key,values, the values are
sorted for every key.*
*then traverse the sorted values to get all unique values.*
**
*   I am not sure which way is more efficient. I doubt HashSet is a
complicated data structure.
*
2011/8/3 Matthew John tmatthewjohn1...@gmail.com

 Hey,

 I feel HashSet is a good method to dedup. To increase the overall
 efficiency
 you could also look into Combiner running the same Reducer code. That would
 ensure less data in the sort-shuffle phase.

 Regards,
 Matthew

 On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote:

  hi,harsh
  After map, I can get all values for one key, but I want dedup these
  values, only get all unique values. now I just do it like the image.
 
  I think the following code is not efficient.(using a HashSet to
 dedup)
  Thanks:)
 
  private static class MyReducer extends
  ReducerLongWritable,LongWritable,LongWritable,LongsWritable
  {
  HashSetLong uids=new HashSetLong();
   LongsWritable unique_uids=new LongsWritable();
  public void reduce(LongWritable key,IterableLongWritable values,Context
  context)throws IOException,InterruptedException
   {
  uids.clear();
  for(LongWritable v:values)
   {
  uids.add(v.get());
  }
   int size=uids.size();
  long[] l=new long[size];
  int i=0;
   for(long uid:uids)
  {
  l[i]=uid;
   i++;
  }
  unique_uids.Set(l);
   context.write(key,unique_uids);
  }
  }
 
 
  2011/8/3 Harsh J ha...@cloudera.com
 
  Use MapReduce :)
 
  If map output: (key, value)
  Then reduce input becomes: (key, [iterator of values across all maps
  with (key, value)])
 
  I believe this is very similar to the wordcount example, but minus the
  summing. For a given key, you get all the values that carry that key
  in the reducer. Have you tried to run a simple program to achieve this
  before asking? Or is something specifically not working?
 
  On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com
 wrote:
   HI,
  I hava many key,value pairs now, and want to get all different
  values
   for each key, which way is efficient for this work.
  
 such as input : 1,2 1,3 1,4 1,3 2,1 2,2
 output: 1,2/3/4 2,1/2
  
 Thanks!
  
   walter
  
 
 
 
  --
  Harsh J
 
 
 



Re: Re: error:Type mismatch in value from map

2011-08-03 Thread madhu phatak
It should. Whats the input value class for reducer you are setting in Job?

2011/7/30 Daniel,Wu hadoop...@163.com

 Thanks Joey,

 It works, but one place I don't understand:

 1: in the map

  extends MapperText, Text, Text, IntWritable
 so the output value is of type IntWritable
 2: in the reduce
 extends ReducerText,Text,Text,IntWritable
 So input value is of type Text.

 type of map output should be the same as input type of reduce, correct? but
 here
 IntWritableText

 And the code can run without any error, shouldn't it complain type
 mismatch?

 At 2011-07-29 22:49:31,Joey Echeverria j...@cloudera.com wrote:
 If you want to use a combiner, your map has to output the same types
 as your combiner outputs. In your case, modify your map to look like
 this:
 
   public static class TokenizerMapper
extends MapperText, Text, Text, IntWritable{
 public void map(Text key, Text value, Context context
 ) throws IOException, InterruptedException {
 context.write(key, new IntWritable(1));
 }
   }
 
   11/07/29 22:22:22 INFO mapred.JobClient: Task Id :
 attempt_201107292131_0011_m_00_2, Status : FAILED
  java.io.IOException: Type mismatch in value from map: expected
 org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
 
  But I already set IntWritable in 2 places,
  1: ReducerText,Text,Text,IntWritable
  2:job.setOutputValueClass(IntWritable.class);
 
  So where am I wrong?
 
  public class MyTest {
 
   public static class TokenizerMapper
extends MapperText, Text, Text, Text{
 public void map(Text key, Text value, Context context
 ) throws IOException, InterruptedException {
 context.write(key, value);
 }
   }
 
   public static class IntSumReducer
extends ReducerText,Text,Text,IntWritable {
 
 public void reduce(Text key, IterableText values,
Context context
) throws IOException, InterruptedException {
int count = 0;
for (Text iw:values) {
 count++;
}
   context.write(key, new IntWritable(count));
  }
}
 
   public static void main(String[] args) throws Exception {
 Configuration conf = new Configuration();
  // the configure of seprator should be done in conf
 conf.set(key.value.separator.in.input.line, ,);
 String[] otherArgs = new GenericOptionsParser(conf,
 args).getRemainingArgs();
 if (otherArgs.length != 2) {
   System.err.println(Usage: wordcount in out);
   System.exit(2);
 }
 Job job = new Job(conf, word count);
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
  //job.setReducerClass(IntSumReducer.class);
 job.setInputFormatClass(KeyValueTextInputFormat.class);
 // job.set(key.value.separator.in.input.line, ,);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
 System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
  }
 
 
 
 
 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Re: error:Type mismatch in value from map

2011-08-03 Thread madhu phatak
Sorry for earlier reply . Is your combiner outputting the Text,Text
key/value pairs?

On Wed, Aug 3, 2011 at 5:26 PM, madhu phatak phatak@gmail.com wrote:

 It should. Whats the input value class for reducer you are setting in Job?

 2011/7/30 Daniel,Wu hadoop...@163.com

 Thanks Joey,

 It works, but one place I don't understand:

 1: in the map

  extends MapperText, Text, Text, IntWritable
 so the output value is of type IntWritable
 2: in the reduce
 extends ReducerText,Text,Text,IntWritable
 So input value is of type Text.

 type of map output should be the same as input type of reduce, correct?
 but here
 IntWritableText

 And the code can run without any error, shouldn't it complain type
 mismatch?

 At 2011-07-29 22:49:31,Joey Echeverria j...@cloudera.com wrote:
 If you want to use a combiner, your map has to output the same types
 as your combiner outputs. In your case, modify your map to look like
 this:
 
   public static class TokenizerMapper
extends MapperText, Text, Text, IntWritable{
 public void map(Text key, Text value, Context context
 ) throws IOException, InterruptedException {
 context.write(key, new IntWritable(1));
 }
   }
 
   11/07/29 22:22:22 INFO mapred.JobClient: Task Id :
 attempt_201107292131_0011_m_00_2, Status : FAILED
  java.io.IOException: Type mismatch in value from map: expected
 org.apache.hadoop.io.IntWritable, recieved org.apache.hadoop.io.Text
 
  But I already set IntWritable in 2 places,
  1: ReducerText,Text,Text,IntWritable
  2:job.setOutputValueClass(IntWritable.class);
 
  So where am I wrong?
 
  public class MyTest {
 
   public static class TokenizerMapper
extends MapperText, Text, Text, Text{
 public void map(Text key, Text value, Context context
 ) throws IOException, InterruptedException {
 context.write(key, value);
 }
   }
 
   public static class IntSumReducer
extends ReducerText,Text,Text,IntWritable {
 
 public void reduce(Text key, IterableText values,
Context context
) throws IOException, InterruptedException {
int count = 0;
for (Text iw:values) {
 count++;
}
   context.write(key, new IntWritable(count));
  }
}
 
   public static void main(String[] args) throws Exception {
 Configuration conf = new Configuration();
  // the configure of seprator should be done in conf
 conf.set(key.value.separator.in.input.line, ,);
 String[] otherArgs = new GenericOptionsParser(conf,
 args).getRemainingArgs();
 if (otherArgs.length != 2) {
   System.err.println(Usage: wordcount in out);
   System.exit(2);
 }
 Job job = new Job(conf, word count);
 job.setJarByClass(WordCount.class);
 job.setMapperClass(TokenizerMapper.class);
 job.setCombinerClass(IntSumReducer.class);
  //job.setReducerClass(IntSumReducer.class);
 job.setInputFormatClass(KeyValueTextInputFormat.class);
 // job.set(key.value.separator.in.input.line, ,);
 job.setOutputKeyClass(Text.class);
 job.setOutputValueClass(IntWritable.class);
 FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
 FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
 System.exit(job.waitForCompletion(true) ? 0 : 1);
   }
  }
 
 
 
 
 --
 Joseph Echeverria
 Cloudera, Inc.
 443.305.9434




 --
 Join me at http://hadoopworkshop.eventbrite.com/




-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re:Re:Re: one quesiton in the book of hadoop:definitive guide 2 edition

2011-08-03 Thread John Armstrong
On Wed, 3 Aug 2011 10:35:51 +0800 (CST), Daniel,Wu hadoop...@163.com
wrote:
 So the key of a group is determined by the first coming record in the
 group,  if we have 3 records in a group
 1: (1900,35)
 2:(1900,34)
 3:(1900,33)
 
 if (1900,35) comes in as the first row, then the result key will be
 (1900,35), when the second row (1900,34) comes in, it won't the impact
the
 key of the group, meaning it will not overwrite the key (1900,35) to
 (1900,34), correct.

Effectively, yes.  Remember that on the inside it's using the comparator
something like this:

(1900, 35).. do I have that key already? [searches collection of keys
with, say, a BST] no! I'll add it here.
(1900,34).. do I have that key already? [searches again, now getting a
result of 0 when comparing to (1900,35)] yes! [it's not the same key, but
according to the GroupComparator it is!] so I'll add its value to the key's
iterable of values.
etc.


Re: ivy download error while building mumak

2011-08-03 Thread madhu phatak
May be maven is not able to connect to central repository because of proxy.

On Fri, Jul 29, 2011 at 2:54 PM, Arun K arunk...@gmail.com wrote:

 Hi all !

  I have downloaded hadoop-0.21.I am behind my college proxy.
  I get the following error while building mumak :

 $cd /home/arun/Documents/hadoop-0.21.0/mapred
 $ant package
 Buildfile: build.xml

 clover.setup:

 clover.info:
 [echo]
 [echo]  Clover not found. Code coverage reports disabled.
 [echo]

 clover:

 ivy-download:
  [get] Getting:
 http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-
 2.1.0.jar
  [get] To: /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar
  [get] Error getting
 http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar to
 /home/arun/Documents/hadoop-0.21.0/mapred/ivy/ivy-2.1.0.jar

 Any help ?

 Thanks,
 Arun K




-- 
Join me at http://hadoopworkshop.eventbrite.com/


TotalOrderPartitioner with new api - help

2011-08-03 Thread Sofia Georgiakaki
Good evening,



I would like to ask you a question regarding the use of TotalOrderPartitioner.
I am working on my diploma thesis, and I need to use the 
TotalOrderPartitioner (with the InputSampler of course), under Hadoop 
0.20.2

In 
order to use it, I need to apply the patch 
(https://issues.apache.org/jira/browse/MAPREDUCE-366), but 
it fails for some reason.

If I am correct, the patch modifies the TotalOrderPartitioner  InputSampler 
classes in  the org.apache.hadoop.mapred.lib package, in order to deprecate 
them and then it specifies 2 new classes to be used: TotalOrderPartitioner  
InputSampler in org.apache.hadoop.mapreduce.lib.partitioner, using the new API.


I would like to ask, if someone has successfully applied the patch. Could he 
send me the new classes 
(TotalOrderPartitioner and InputSampler) from their hadoop installation, after 
the patch is applied? (it affects the 2 classes 
both in org.apache.hadoop.mapred.lib and 
org.apache.hadoop.mapreduce.lib.partitioner packages). Or at least you 
could suggest another solution?


I hope this will not consume your time. I apologize for the inconvenience, but 
I 
need these two classes in order to finish my diploma thesis, and I don't know 
from who I should ask for help.


Thank you very much in advance,
Sofia Georgiakaki
undergraduate student
department of Electronic  Computer Engineering
Technical University of Crete, Greece

Re: how to get all different values for each key

2011-08-03 Thread Harsh J
Secondary sort is the way to go. Easier to dedup a sorted input set.
Although you can also try to filter in map and combine phases to a
safe extent possible (sets, etc.), to speed up the process and reduce
data transfers.

On Wed, Aug 3, 2011 at 4:07 PM, Jianxin Wang wangjx...@gmail.com wrote:
 thanks! Matthew :
 *
 *
 *    how about using SecondarySory to get key,values, the values are
 sorted for every key.*
 *then traverse the sorted values to get all unique values.*
 *    *
 *   I am not sure which way is more efficient. I doubt HashSet is a
 complicated data structure.
 *
 2011/8/3 Matthew John tmatthewjohn1...@gmail.com

 Hey,

 I feel HashSet is a good method to dedup. To increase the overall
 efficiency
 you could also look into Combiner running the same Reducer code. That would
 ensure less data in the sort-shuffle phase.

 Regards,
 Matthew

 On Wed, Aug 3, 2011 at 11:52 AM, Jianxin Wang wangjx...@gmail.com wrote:

  hi,harsh
      After map, I can get all values for one key, but I want dedup these
  values, only get all unique values. now I just do it like the image.
 
      I think the following code is not efficient.(using a HashSet to
 dedup)
  Thanks:)
 
  private static class MyReducer extends
  ReducerLongWritable,LongWritable,LongWritable,LongsWritable
  {
  HashSetLong uids=new HashSetLong();
   LongsWritable unique_uids=new LongsWritable();
  public void reduce(LongWritable key,IterableLongWritable values,Context
  context)throws IOException,InterruptedException
   {
  uids.clear();
  for(LongWritable v:values)
   {
  uids.add(v.get());
  }
   int size=uids.size();
  long[] l=new long[size];
  int i=0;
   for(long uid:uids)
  {
  l[i]=uid;
   i++;
  }
  unique_uids.Set(l);
   context.write(key,unique_uids);
  }
  }
 
 
  2011/8/3 Harsh J ha...@cloudera.com
 
  Use MapReduce :)
 
  If map output: (key, value)
  Then reduce input becomes: (key, [iterator of values across all maps
  with (key, value)])
 
  I believe this is very similar to the wordcount example, but minus the
  summing. For a given key, you get all the values that carry that key
  in the reducer. Have you tried to run a simple program to achieve this
  before asking? Or is something specifically not working?
 
  On Wed, Aug 3, 2011 at 9:20 AM, Jianxin Wang wangjx...@gmail.com
 wrote:
   HI,
      I hava many key,value pairs now, and want to get all different
  values
   for each key, which way is efficient for this work.
  
     such as input : 1,2 1,3 1,4 1,3 2,1 2,2
     output: 1,2/3/4 2,1/2
  
     Thanks!
  
   walter
  
 
 
 
  --
  Harsh J
 
 
 





-- 
Harsh J


Re: TotalOrderPartitioner with new api - help

2011-08-03 Thread Harsh J
Sofia,

I'd recommend using the old (actually, stable) API for development
right now, when using 0.20.2. Do not be confused by the deprecation
marks since it has been un-deprecated for later releases. Using the
stable API should rid you of the trouble of patching the whole thing
up.

I.e., use JobConf+JobClient+'mapred'-package to build and run jobs
instead of the 'Job' class and 'mapreduce' package.

On Wed, Aug 3, 2011 at 6:05 PM, Sofia Georgiakaki
geosofie_...@yahoo.com wrote:
 Good evening,



 I would like to ask you a question regarding the use of TotalOrderPartitioner.
 I am working on my diploma thesis, and I need to use the
 TotalOrderPartitioner (with the InputSampler of course), under Hadoop
 0.20.2

 In
 order to use it, I need to apply the patch
 (https://issues.apache.org/jira/browse/MAPREDUCE-366), but
 it fails for some reason.

 If I am correct, the patch modifies the TotalOrderPartitioner  InputSampler 
 classes in  the org.apache.hadoop.mapred.lib package, in order to deprecate 
 them and then it specifies 2 new classes to be used: TotalOrderPartitioner  
 InputSampler in org.apache.hadoop.mapreduce.lib.partitioner, using the new 
 API.


 I would like to ask, if someone has successfully applied the patch. Could he 
 send me the new classes
 (TotalOrderPartitioner and InputSampler) from their hadoop installation, 
 after the patch is applied? (it affects the 2 classes
 both in org.apache.hadoop.mapred.lib and
 org.apache.hadoop.mapreduce.lib.partitioner packages). Or at least you
 could suggest another solution?


 I hope this will not consume your time. I apologize for the inconvenience, 
 but I
 need these two classes in order to finish my diploma thesis, and I don't know 
 from who I should ask for help.


 Thank you very much in advance,
 Sofia Georgiakaki
 undergraduate student
 department of Electronic  Computer Engineering
 Technical University of Crete, Greece



-- 
Harsh J


Re: TotalOrderPartitioner with new api - help

2011-08-03 Thread Sofia Georgiakaki
Thank you for your reply.

This is what the creator of the patch also recommended.
The problem is, that I already have developed the project using the new API (I 
didn't know about the problems), so it won't be so easy to convert 
the whole job. In addition, I'm nervous wondering if the code will run 
after these changes... Aren't those classes in the old API deprecated? If I 
should apply a patch to deprecate them, it would not be a solution, since the 
code will be tested on the cluster at my university and I could not apply such 
a patch there, I suppose.

 In addition, the cluster is possible that
 it will be updated to Hadoop 0.20.203. Will I have a problem using the old api 
then??


Hadoop is confusing, I say.

Thank you,
Sofia Georgiakaki

RE: TotalOrderPartitioner with new api - help

2011-08-03 Thread Janarthanan, Maheshwaran (CDS - San Bruno)
Please unsubscribe me.

-Original Message-
From: Sofia Georgiakaki [mailto:geosofie_...@yahoo.com] 
Sent: Wednesday, August 03, 2011 9:42 AM
To: common-user@hadoop.apache.org
Subject: Re: TotalOrderPartitioner with new api - help

Thank you for your reply.

This is what the creator of the patch also recommended.
The problem is, that I already have developed the project using the new API (I 
didn't know about the problems), so it won't be so easy to convert the whole 
job. In addition, I'm nervous wondering if the code will run after these 
changes... Aren't those classes in the old API deprecated? If I should apply a 
patch to deprecate them, it would not be a solution, since the code will be 
tested on the cluster at my university and I could not apply such a patch 
there, I suppose.

 In addition, the cluster is possible that  it will be updated to Hadoop 
0.20.203. Will I have a problem using the old api then??


Hadoop is confusing, I say.

Thank you,
Sofia Georgiakaki


Re: YCSB Benchmarking for HBase

2011-08-03 Thread Edward Capriolo
On Wed, Aug 3, 2011 at 6:10 AM, praveenesh kumar praveen...@gmail.comwrote:

 Hi,

 Anyone working on YCSB (Yahoo Cloud Service Benchmarking) for HBase ??

 I am trying to run it, its giving me error:

 $ java -cp build/ycsb.jar com.yahoo.ycsb.CommandLine -db
 com.yahoo.ycsb.db.HBaseClient

 YCSB Command Line client
 Type help for command line help
 Start with -help for usage info
 Exception in thread main java.lang.NoClassDefFoundError:
 org/apache/hadoop/conf/Configuration
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2406)
at java.lang.Class.getConstructor0(Class.java:2716)
at java.lang.Class.newInstance0(Class.java:343)
at java.lang.Class.newInstance(Class.java:325)
at com.yahoo.ycsb.CommandLine.main(Unknown Source)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.conf.Configuration
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
... 6 more

 By the error, it seems like its not able to get Hadoop-core.jar file, but
 its already in the class path.
 Has anyone worked on YCSB with hbase ?

 Thanks,
 Praveenesh



I just did
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/ycsb_cassandra_0_7_6.

For hbase I followed the steps here:
http://blog.lars-francke.de/2010/08/16/performance-testing-hbase-using-ycsb/

I also followed the comment in the bottom to make sure the hbase-site.xml
was on the classpath.

Startup script looks like this:
CP=build/ycsb.jar:db/hbase/conf/
for i in db/hbase/lib/* ; do
CP=$CP:${i}
done
#-load load the workload
#-t run the workload
java -cp $CP com.yahoo.ycsb.Client -db com.yahoo.ycsb.db.HBaseClient -P
workloads/workloadb \


Re: Kill Task Programmatically

2011-08-03 Thread Aleksandr Elbakyan
Hello,

You can just throw run time exception. In that case it will fail :)

Regards,
Aleksandr 

--- On Wed, 8/3/11, Adam Shook ash...@clearedgeit.com wrote:

From: Adam Shook ash...@clearedgeit.com
Subject: Kill Task Programmatically
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Wednesday, August 3, 2011, 3:33 PM

Is there any way I can programmatically kill or fail a task, preferably from 
inside a Mapper or Reducer?

At any time during a map or reduce task, I have a use case where I know it 
won't succeed based solely on the machine it is running on.  It is rare, but I 
would prefer to kill the task and have Hadoop start it up on a different 
machine as usual instead of waiting for the 10 minute default timeout.

I suppose the speculative execution could take care of it, but I would rather 
not rely on it if I am able to kill it myself.

Thanks,
Adam


RE: Kill Task Programmatically

2011-08-03 Thread Devaraj K

Adam,

   You can use RunningJob.killTask(TaskAttemptID taskId, boolean shouldFail)
API to kill the task. 

Clients can get hold of RunningJob via the JobClient and then use
running-job for killing the task etc.


Refer API doc :
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Ru
nningJob.html#killTask(org.apache.hadoop.mapred.TaskAttemptID, boolean)


Devaraj K 

-Original Message-
From: Aleksandr Elbakyan [mailto:ramal...@yahoo.com] 
Sent: Thursday, August 04, 2011 5:10 AM
To: common-user@hadoop.apache.org
Subject: Re: Kill Task Programmatically

Hello,

You can just throw run time exception. In that case it will fail :)

Regards,
Aleksandr 

--- On Wed, 8/3/11, Adam Shook ash...@clearedgeit.com wrote:

From: Adam Shook ash...@clearedgeit.com
Subject: Kill Task Programmatically
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Wednesday, August 3, 2011, 3:33 PM

Is there any way I can programmatically kill or fail a task, preferably from
inside a Mapper or Reducer?

At any time during a map or reduce task, I have a use case where I know it
won't succeed based solely on the machine it is running on.  It is rare, but
I would prefer to kill the task and have Hadoop start it up on a different
machine as usual instead of waiting for the 10 minute default timeout.

I suppose the speculative execution could take care of it, but I would
rather not rely on it if I am able to kill it myself.

Thanks,
Adam