Re: Hadoop project - help needed

2011-07-16 Thread Raja Nagendra Kumar

Haoop is meant for Distributed Data Storage as the pre-requisite and Data
Analysis later.
In your case this seem to be parallel downloads and store the images in
different place.
Yes, you could just do what you want to do with flicker api in mapper with
empty one liner junk data.. however you would not use this data with in
mapper..

Regards,
Raja Nagendra Kumar,
C.T.O
www.tejasoft.com
 

parismav wrote:
 
 Hello dear forum, 
 i am working on a project on apache Hadoop, i am totally new to this
 software and i need some help understanding the basic features!
 
 To sum up, for my project i have configured hadoop so that it runs 3
 datanodes on one machine.
 The project's main goal is, to use both Flickr API (flickr.com) libraries
 and hadoop libraries on Java, so that each one of the 3 datanodes, chooses
 a Flickr group and returns photos' info from that group.
 
 In order to do that, i have 3 flickr accounts, each one with a different
 api key. 
 
 I dont need any help on the flickr side of the code, ofcourse. But what i
 dont understand, is how to use the Mapper and Reducer part of the code. 
 What input do i have to give the Map() function? 
 do i have to contain this whole info downloading process in the map()
 function? 
 
 In a few words, how do i convert my code so that it runs distributedly on
 hadoop? 
 thank u!
 

-- 
View this message in context: 
http://old.nabble.com/Hadoop-project---help-needed-tp31741968p32076097.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Hadoop project - help needed

2011-05-31 Thread parismav

Hello dear forum, 
i am working on a project on apache Hadoop, i am totally new to this
software and i need some help understanding the basic features!

To sum up, for my project i have configured hadoop so that it runs 3
datanodes on one machine.
The project's main goal is, to use both Flickr API (flickr.com) libraries
and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a
Flickr group and returns photos' info from that group.

In order to do that, i have 3 flickr accounts, each one with a different api
key. 

I dont need any help on the flickr side of the code, ofcourse. But what i
dont understand, is how to use the Mapper and Reducer part of the code. 
What input do i have to give the Map() function? 
do i have to contain this whole info downloading process in the map()
function? 

In a few words, how do i convert my code so that it runs distributedly on
hadoop? 
thank u!
-- 
View this message in context: 
http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Hadoop project - help needed

2011-05-31 Thread Robert Evans
Parismav,

So you are more or less trying to scrape some data in a distributed way.  Well 
there are several things that you could do, just be careful I am not sure the 
terms of service for the flickr APIs so make sure that you are not violating 
them by downloading too much data.  You probably want to use the map input data 
to be command/control for what the mappers do.  I would probably put in a 
format like

ACCOUT INFO\tGROUP INFO\n

Then you could use the N-line input format so that each mapper will process one 
line out of the file.  Something like (This is just psudo code)

MapperLong, String, ?, ? {
  map(Long offset, String line,...) {
String parts = line.split(\t);
openConnection(parts[0]);
GroupData gd = getDataAboutGroup(parts[1]);
...
  }
}

I would probably not bother with a reducer if all you are doing is pulling down 
data.  Also the output format you choose really depends on the type of data you 
are downloading, and how you want to use that data later.  For example if you 
want to download the actual picture then you probably want to use a sequence 
file format or some other binary format, because converting a picture to text 
can be very costly.

--Bobby Evans

On 5/31/11 10:35 AM, parismav paok_gate...@hotmail.com wrote:



Hello dear forum,
i am working on a project on apache Hadoop, i am totally new to this
software and i need some help understanding the basic features!

To sum up, for my project i have configured hadoop so that it runs 3
datanodes on one machine.
The project's main goal is, to use both Flickr API (flickr.com) libraries
and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a
Flickr group and returns photos' info from that group.

In order to do that, i have 3 flickr accounts, each one with a different api
key.

I dont need any help on the flickr side of the code, ofcourse. But what i
dont understand, is how to use the Mapper and Reducer part of the code.
What input do i have to give the Map() function?
do i have to contain this whole info downloading process in the map()
function?

In a few words, how do i convert my code so that it runs distributedly on
hadoop?
thank u!
--
View this message in context: 
http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.




Re: Hadoop project - help needed

2011-05-31 Thread jagaran das
Hi,

To be very precise,
input to the mapper should be something you want to filter on basis of which 
you 
want to do the aggregation.
The Reducer is where you aggregate the output from mapper.

Check the WordCount Example in Hadoop, it can help you to understand the basic 
concepts.

Cheers,
Jagaran 




From: parismav paok_gate...@hotmail.com
To: core-u...@hadoop.apache.org
Sent: Tue, 31 May, 2011 8:35:27 AM
Subject: Hadoop project - help needed


Hello dear forum, 
i am working on a project on apache Hadoop, i am totally new to this
software and i need some help understanding the basic features!

To sum up, for my project i have configured hadoop so that it runs 3
datanodes on one machine.
The project's main goal is, to use both Flickr API (flickr.com) libraries
and hadoop libraries on Java, so that each one of the 3 datanodes, chooses a
Flickr group and returns photos' info from that group.

In order to do that, i have 3 flickr accounts, each one with a different api
key. 

I dont need any help on the flickr side of the code, ofcourse. But what i
dont understand, is how to use the Mapper and Reducer part of the code. 
What input do i have to give the Map() function? 
do i have to contain this whole info downloading process in the map()
function? 

In a few words, how do i convert my code so that it runs distributedly on
hadoop? 
thank u!
-- 
View this message in context: 
http://old.nabble.com/Hadoop-project---help-needed-tp31741968p31741968.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.