Re: Running a mapreduce class in another Java program

2013-12-06 Thread Anseh Danesh
Thanks for the reply. I import it to my second program library. I don't
know how to call my MR class methods (for example WordCount Program) in the
second program..? I do MYProg myprog = new MYProg(); and now I should call
a mapreduce method. what method should I call to execute the first
mapreduce class?


On Fri, Dec 6, 2013 at 9:01 PM, Anh Pham  wrote:

> in your second java program's java file, import classes you want from the 1st 
> mapreduce class.
>
> import packagename.class;
>
> ...
>
>
> compile:
> java -cp //yourMRClass.jar yourSecondClass
>
>
>
> On Fri, Dec 6, 2013 at 11:30 AM, Anseh Danesh wrote:
>
>> Hi all.
>> I write a mapreduce class and create a jar file from the class. now I
>> want to use this jar in another java program. can anyone help me please how
>> could I do this?  I import the jar file into my second java program and now
>> I want to call its method.. how can I do this? thanks
>>
>>
>


Fwd: Running a mapreduce class in another Java program

2013-12-06 Thread Anseh Danesh
Hi all.
I write a mapreduce class and create a jar file from the class. now I want
to use this jar in another java program. can anyone help me please how
could I do this?  I import the jar file into my second java program and now
I want to call its method.. how can I do this? thanks


Re: map phase does not read intermediate results with SequenceFileInputFormat

2013-10-25 Thread Anseh Danesh
yes.. thanks for the reply..


On Fri, Oct 25, 2013 at 1:46 PM, Dieter De Witte  wrote:

> The question is also on stackoverflow, the problem is that she divides by
> zero in the second mapper I think. (the logs show that both jobs have a
> data flow..
>
>
> 2013/10/25 Robin East 
>
>> Hi
>>
>> Are you sure job1 created output where you expected it and in the format
>> you expect? Have you tested job2 on its own with some hand-crafted test
>> data? Is the output from job1 consistent with your hand-crafted test data
>> for job2?
>>
>> regards
>> On 25 Oct 2013, at 06:46, Anseh Danesh  wrote:
>>
>> Hi all.
>>
>> I have a mapreduce program with two jobs. second job's key and value
>> comes from first job output. but I think the second map does not get the
>> result from first job. in other words I think my second job did not read
>> the output of my first job.. what should I do?
>>
>> here is the code:
>>
>> public class dewpoint extends Configured implements Tool
>> {
>>   private static final Logger logger = 
>> LoggerFactory.getLogger(dewpoint.class);
>>
>> static final String KEYSPACE = "weather";
>> static final String COLUMN_FAMILY = "user";
>> private static final String OUTPUT_PATH1 = "/tmp/intermediate1";
>> private static final String OUTPUT_PATH2 = "/tmp/intermediate2";
>> private static final String OUTPUT_PATH3 = "/tmp/intermediate3";
>> private static final String INPUT_PATH1 = "/tmp/intermediate1";
>>
>> public static void main(String[] args) throws Exception
>> {
>>
>> ToolRunner.run(new Configuration(), new dewpoint(), args);
>> System.exit(0);
>> }
>>
>> ///
>>
>> public static class dpmap1 extends Mapper, 
>> Map, Text, DoubleWritable>
>> {
>> DoubleWritable val1 = new DoubleWritable();
>> Text word = new Text();
>> String date;
>> float temp;
>> public void map(Map keys, Map> ByteBuffer> columns, Context context) throws IOException, 
>> InterruptedException
>> {
>>
>>  for (Entry key : keys.entrySet())
>>  {
>>  //System.out.println(key.getKey());
>>  if (!"date".equals(key.getKey()))
>>  continue;
>>  date = ByteBufferUtil.string(key.getValue());
>>  word.set(date);
>>  }
>>
>>
>> for (Entry column : columns.entrySet())
>> {
>> if (!"temprature".equals(column.getKey()))
>> continue;
>> temp = ByteBufferUtil.toFloat(column.getValue());
>> val1.set(temp);
>> //System.out.println(temp);
>>}
>> context.write(word, val1);
>> }
>> }
>>
>> ///
>>
>> public static class dpred1 extends Reducer> DoubleWritable>
>> {
>>public void reduce(Text key, Iterable values, Context 
>> context) throws IOException, InterruptedException
>> {
>> double beta = 17.62;
>> double landa = 243.12;
>> DoubleWritable result1 = new DoubleWritable();
>> DoubleWritable result2 = new DoubleWritable();
>>  for (DoubleWritable val : values){
>>  //  System.out.println(val.get());
>>beta *= val.get();
>>landa+=val.get();
>>}
>>  result1.set(beta);
>>  result2.set(landa);
>>
>>  context.write(key, result1);
>>  context.write(key, result2);
>>  }
>> }
>> ///
>>
>> public static class dpmap2 extends Mapper > DoubleWritable>{
>>
>> Text key2 = new Text();
>> double temp1, temp2 =0;
>>
>> public void map(Text key, Iterable values, Context 
>> context) throws IOException, InterruptedException {
>> String[] sp = values.toString().split("\t");
>> for (int i=0; i< sp.length; i+=4)
>> //key2.set(sp[i]);
>> System.out.println(sp[i]);
>> for(int j=1;j< sp.length; j+=4)
>> temp1 = Double.valueOf(sp[j]);
>> for (int k=3;k< sp.length; k+=4)
>> temp2 = Double.valueOf(sp[k]);
>> context

map phase does not read intermediate results with SequenceFileInputFormat

2013-10-24 Thread Anseh Danesh
Hi all.

I have a mapreduce program with two jobs. second job's key and value comes
from first job output. but I think the second map does not get the result
from first job. in other words I think my second job did not read the
output of my first job.. what should I do?

here is the code:

public class dewpoint extends Configured implements Tool
{
  private static final Logger logger = LoggerFactory.getLogger(dewpoint.class);

static final String KEYSPACE = "weather";
static final String COLUMN_FAMILY = "user";
private static final String OUTPUT_PATH1 = "/tmp/intermediate1";
private static final String OUTPUT_PATH2 = "/tmp/intermediate2";
private static final String OUTPUT_PATH3 = "/tmp/intermediate3";
private static final String INPUT_PATH1 = "/tmp/intermediate1";

public static void main(String[] args) throws Exception
{

ToolRunner.run(new Configuration(), new dewpoint(), args);
System.exit(0);
}

///

public static class dpmap1 extends Mapper,
Map, Text, DoubleWritable>
{
DoubleWritable val1 = new DoubleWritable();
Text word = new Text();
String date;
float temp;
public void map(Map keys, Map columns, Context context) throws IOException,
InterruptedException
{

 for (Entry key : keys.entrySet())
 {
 //System.out.println(key.getKey());
 if (!"date".equals(key.getKey()))
 continue;
 date = ByteBufferUtil.string(key.getValue());
 word.set(date);
 }


for (Entry column : columns.entrySet())
{
if (!"temprature".equals(column.getKey()))
continue;
temp = ByteBufferUtil.toFloat(column.getValue());
val1.set(temp);
//System.out.println(temp);
   }
context.write(word, val1);
}
}

///

public static class dpred1 extends Reducer
{
   public void reduce(Text key, Iterable values,
Context context) throws IOException, InterruptedException
{
double beta = 17.62;
double landa = 243.12;
DoubleWritable result1 = new DoubleWritable();
DoubleWritable result2 = new DoubleWritable();
 for (DoubleWritable val : values){
 //  System.out.println(val.get());
   beta *= val.get();
   landa+=val.get();
   }
 result1.set(beta);
 result2.set(landa);

 context.write(key, result1);
 context.write(key, result2);
 }
}
///

public static class dpmap2 extends Mapper {

Text key2 = new Text();
double temp1, temp2 =0;

public void map(Text key, Iterable values, Context
context) throws IOException, InterruptedException {
String[] sp = values.toString().split("\t");
for (int i=0; i< sp.length; i+=4)
//key2.set(sp[i]);
System.out.println(sp[i]);
for(int j=1;j< sp.length; j+=4)
temp1 = Double.valueOf(sp[j]);
for (int k=3;k< sp.length; k+=4)
temp2 = Double.valueOf(sp[k]);
context.write(key2, new DoubleWritable(temp2/temp1));

}
}

///


public static class dpred2 extends Reducer
{
   public void reduce(Text key, Iterable values,
Context context) throws IOException, InterruptedException
{

   double alpha = 6.112;
double tmp = 0;
DoubleWritable result3 = new DoubleWritable();
 for (DoubleWritable val : values){
 System.out.println(val.get());
 tmp = alpha*(Math.pow(Math.E, val.get()));

 }
 result3.set(tmp);
 context.write(key, result3);


  }
}


///


public int run(String[] args) throws Exception
{

 Job job1 = new Job(getConf(), "DewPoint");
 job1.setJarByClass(dewpoint.class);
 job1.setMapperClass(dpmap1.class);
 job1.setOutputFormatClass(SequenceFileOutputFormat.class);
 job1.setCombinerClass(dpred1.class);
 job1.setReducerClass(dpred1.class);
 job1.setMapOutputKeyClass(Text.class);
 job1.setMapOutputValueClass(DoubleWritable.class);
 job1.setOutputKeyClass(Text.class);
 job1.setOutputValueClass(DoubleWritable.class);
 FileOutputFormat.setOutputPath(job1, new Path(OUTPUT_PATH1));


 job1.setInputFormatClass(CqlPagingInputFormat.class);

 ConfigHelper.setInputRpcPort(job1.getConfiguration(), "9160");
 ConfigHelper.setInputInitialAddress(job1.getConfiguration(), "localhost");
 ConfigHelper.setInputColumnFamily(job1.getConfiguration(),
KEYSPACE, COLUMN_FAMILY);
 ConfigHelper.setInputPartitioner(job1.getConfiguration(),
"Murmur3Partitioner");

 CqlConfigHelper.setInputCQLPageRowSize(job1.getConfiguration(), "3");
 job1.waitForCompletion(true);

 /

Re: number of map and reduce task does not change in M/R program

2013-10-21 Thread Anseh Danesh
Thanks a lot for the reply..


On Mon, Oct 21, 2013 at 10:39 AM, Dieter De Witte wrote:

> Anseh,
>
> Let's assume that your job is fully scalable, then it should take: 100 000
> 000 / 600 000 times the amount of time of the first job, which is 1000 / 6
> = 167 times longer. This is an ideal, probably it will be something like
> 200 times. Also try using units in your questions + scientific notation
> 10^8 records or 10^8 bytes?
>
> Regards, irW
>
>
> 2013/10/20 Anseh Danesh 
>
>> OK... thanks a lot for the link... it is so useful... ;)
>>
>>
>> On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin  wrote:
>>
>>> Try profiling the job (
>>> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
>>> And yeah the machine specs could be the reason, that's why hadoop was
>>> invented in the first place ;)
>>>
>>>
>>> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh wrote:
>>>
>>>> I try it in a small set of data, in about 60 data and it does not
>>>> take too long. the execution time was reasonable. but in the set of
>>>> 1 data it really works too bad. any thing else, I have 2 processors
>>>> in my machine, I think this amount of data is very huge for my processor
>>>> and this way it takes too long to process... what do you think about this?
>>>>
>>>>
>>>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin wrote:
>>>>
>>>>> Try running the job locally on a small set of the data and see if it
>>>>> takes too long. If so, you map code might have some performance issues
>>>>>
>>>>>
>>>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh 
>>>>> wrote:
>>>>>
>>>>>> Hi all.. I have a question.. I have a mapreduce program that get
>>>>>> input from cassandra. my input is a little big, about 1 data. my
>>>>>> problem is that my program takes too long to process, but I think 
>>>>>> mapreduce
>>>>>> is good and fast for large volume of data. so I think maybe I have 
>>>>>> problems
>>>>>> in number of map and reduce tasks.. I set the number of map and reduce 
>>>>>> asks
>>>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>>>> any changes.. in my logs at first there is map 0% reduce 0% and after 
>>>>>> about
>>>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please 
>>>>>> Help
>>>>>> me I really get confused...
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>


Re: number of map and reduce task does not change in M/R program

2013-10-20 Thread Anseh Danesh
OK... thanks a lot for the link... it is so useful... ;)


On Sun, Oct 20, 2013 at 6:59 PM, Amr Shahin  wrote:

> Try profiling the job (
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling)
> And yeah the machine specs could be the reason, that's why hadoop was
> invented in the first place ;)
>
>
> On Sun, Oct 20, 2013 at 8:39 AM, Anseh Danesh wrote:
>
>> I try it in a small set of data, in about 60 data and it does not
>> take too long. the execution time was reasonable. but in the set of
>> 1 data it really works too bad. any thing else, I have 2 processors
>> in my machine, I think this amount of data is very huge for my processor
>> and this way it takes too long to process... what do you think about this?
>>
>>
>> On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin  wrote:
>>
>>> Try running the job locally on a small set of the data and see if it
>>> takes too long. If so, you map code might have some performance issues
>>>
>>>
>>> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh wrote:
>>>
>>>> Hi all.. I have a question.. I have a mapreduce program that get input
>>>> from cassandra. my input is a little big, about 1 data. my problem
>>>> is that my program takes too long to process, but I think mapreduce is good
>>>> and fast for large volume of data. so I think maybe I have problems in
>>>> number of map and reduce tasks.. I set the number of map and reduce asks
>>>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>>>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>>>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>>>> me I really get confused...
>>>>
>>>
>>>
>>
>


Re: number of map and reduce task does not change in M/R program

2013-10-19 Thread Anseh Danesh
I try it in a small set of data, in about 60 data and it does not take
too long. the execution time was reasonable. but in the set of 1
data it really works too bad. any thing else, I have 2 processors in my
machine, I think this amount of data is very huge for my processor and this
way it takes too long to process... what do you think about this?


On Sun, Oct 20, 2013 at 1:49 AM, Amr Shahin  wrote:

> Try running the job locally on a small set of the data and see if it takes
> too long. If so, you map code might have some performance issues
>
>
> On Sat, Oct 19, 2013 at 9:08 AM, Anseh Danesh wrote:
>
>> Hi all.. I have a question.. I have a mapreduce program that get input
>> from cassandra. my input is a little big, about 1 data. my problem
>> is that my program takes too long to process, but I think mapreduce is good
>> and fast for large volume of data. so I think maybe I have problems in
>> number of map and reduce tasks.. I set the number of map and reduce asks
>> with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
>> any changes.. in my logs at first there is map 0% reduce 0% and after about
>> 2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
>> me I really get confused...
>>
>
>


number of map and reduce task does not change in M/R program

2013-10-18 Thread Anseh Danesh
Hi all.. I have a question.. I have a mapreduce program that get input from
cassandra. my input is a little big, about 1 data. my problem is
that my program takes too long to process, but I think mapreduce is good
and fast for large volume of data. so I think maybe I have problems in
number of map and reduce tasks.. I set the number of map and reduce asks
with JobConf, with Job, and also in conf/mapred-site.xml, but I don't see
any changes.. in my logs at first there is map 0% reduce 0% and after about
2 hours working it shows map 1% reduce 0%..!! what should I do? please Help
me I really get confused...