RE: Question about job distribution

2009-07-14 Thread Amogh Vasekar
Confused. What do you mean by "query be distributed over all datanodes or just 
1 node" . If your data is small enough so that it fits in just one block ( and 
replicated by hadoop ), then just one task will be run ( assuming default input 
split).
If the data is spread across multiple blocks, you can make it run on just one 
compute node by setting your input split to be large enough ( yes there are use 
cases for this when whole data is to be fed to a single mapper ). Else, the job 
will be scheduled on numerous nodes with each getting a block / chunk (  input 
split size set ) of your actual data. The nodes picked for running your job 
depends on data-locality to reduce network latency.

Thanks,
Amogh

-Original Message-
From: Divij Durve [mailto:divij.t...@gmail.com] 
Sent: Wednesday, July 15, 2009 2:32 AM
To: common-user@hadoop.apache.org; core-u...@hadoop.apache.org
Subject: Question about job distribution

If i have a query that i would normally fire on a database and i want o fire
that using the data loaded into multiple nodes on hadoop. Will the query be
distributed over all the datanodes so it returns results faster or will it
just send it to 1 node? If so is there a way to get it to distribute the
query instead of sending it to 1 node?
Thanks
Divij


Re: Compression issues!!

2009-07-14 Thread Tarandeep Singh
You can put compress data on HDFS and run Map Reduce job on it. But you
should use a codec that supports file splitting, otherwise whole file will
be read by one mapper. If you have read about Map reduce architecture, you
would understand that a map function processes chunk of data (called split).
If file is big and supports splitting (e.g. plain text file where lines are
separated by new lines or sequence files) then the big file can be processed
in parallel by multiple mappers (each processing a split of the file).
However if the compression codec that you use does not supprt file
splitting, then whole file will be processed by one mapper and you won't
achieve parallelism.

Check Hadoop wiki on compression codecs that support file splitting.

-Tarandeep

On Tue, Jul 14, 2009 at 10:39 PM, Sugandha Naolekar
wrote:

> Hello!
>
> Few days back, I had asked about the compression of data placed in
> hadoop..I
> did get apt replies as::
>
> Place the data first in HDFS and then compress it, so that the data would
> be
> in sequence files.
>
> But, here my query is, I want to compress the data before placing it in
> HDFS, so that redundancy won't come into picture..!
>
> How to do that...!Also, will I have to use external compression algo. or
> simply api's would solve the purpose?
>
> --
> Regards!
> Sugandha
>


Re: Using Hadoop Index located in contrib folder

2009-07-14 Thread Rakhi Khatwani
Hi Bhushan,
i executed your command and i got the following exception:

java.io.IOException: Error opening job jar:
../contrib/index/hadoop-0.19.1-index.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:91)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:114)
at java.util.jar.JarFile.(JarFile.java:133)
at java.util.jar.JarFile.(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:89)


what can it be due to?

Regards,
Raakhi


On Mon, Jul 13, 2009 at 6:47 PM, bhushan_mahale <
bhushan_mah...@persistent.co.in> wrote:

> Hi Rakhi,
>
> Yes. There is not much information available.
> You can use following command to run th Hadoop-index.jar:
>
> bin/hadoop jar -libjars src/contrib/index/lib/lucene-core-2.3.1.jar
> contrib/index/hadoop-0.19.1-index.jar -inputPaths  -outputPath
>  -indexPath  -conf
> src/contrib/index/conf/index-config.xml
>
> I hope this helps.
>
> Regards,
> - Bhushan
>
>
> -Original Message-
> From: Rakhi Khatwani [mailto:rakhi.khatw...@gmail.com]
> Sent: Monday, July 13, 2009 3:07 PM
> To: core-user
> Subject: Using Hadoop Index located in contrib folder
>
> Hi,
>   i was going through Hadoop-contrib and came across hadoop-index.jar
> was wondering how to use it. there is no help online. i would greatly
> appreciate if i could get a small tutorial. forexample i have a file which
> contains data, i would like to create index for it. how do i go about it?
>
> Regards
> Raakhi
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>


Compression issues!!

2009-07-14 Thread Sugandha Naolekar
Hello!

Few days back, I had asked about the compression of data placed in hadoop..I
did get apt replies as::

Place the data first in HDFS and then compress it, so that the data would be
in sequence files.

But, here my query is, I want to compress the data before placing it in
HDFS, so that redundancy won't come into picture..!

How to do that...!Also, will I have to use external compression algo. or
simply api's would solve the purpose?

-- 
Regards!
Sugandha


Re: Cluster configuration issue

2009-07-14 Thread Tamir Kamara
Hi,

In order to patch use a command like this: "patch -p0 < patchfile" and then
build with something like ant or "ant tar" (which will make it easy to
distribute to the nodes).

However, I think that version 0.19.2 is close to be released so if you you
can wait it would be easier and you'll get a few more fixes. Maybe someone
else can comment on when will it be released...

Tamir

On Wed, Jul 15, 2009 at 3:24 AM, Mac Yuki  wrote:

> Thanks for the help!  That bug looks like the one I'm running into and the
> code in my src/ directory matches the older one on the patch svn diff.
>  Here's a beginner question: how do I apply just that patch to my
> installation?  Thanks.Mac
>
>
> On Tue, Jul 14, 2009 at 10:03 AM, Tamir Kamara  >wrote:
>
> > Hi,
> >
> > To restart hadoop on a specific node use a command like this:
> > "hadoop-daemon.sh stop tasktracker" and after that the same command with
> > start. You can also do the same with the datanode but it doesn't look
> like
> > there's a problem there.
> >
> > I had the same problem with missing slots but I don't remeber if it
> > happened
> > on maps. The fix in my case was this patch:
> > https://issues.apache.org/jira/browse/HADOOP-5269
> >
> >
> > Tamir
> >
> > On Tue, Jul 14, 2009 at 7:25 PM, KTM  wrote:
> >
> > > Hi, I'm running Hadoop 0.19.1 on a cluster with 8 machines, 7 of which
> > are
> > > used as slaves and the other the master, each with 2 dual-core AMD CPUs
> > and
> > > generous amounts of RAM.  I am running map-only jobs and have the
> slaves
> > > set
> > > up to have 4 mappers each, for a total of 28 available mappers.  When I
> > > first start up my cluster, I am able to use all 28 mappers.  However,
> > after
> > > a short bit of time (~12 hours), jobs that I submit start using fewer
> > > mappers.  I restarted my cluster last night, and currently only 19
> > mappers
> > > are running tasks even though more tasks are pending, with at least 2
> > tasks
> > > running per machine - so no machine has gone down.  I have checked that
> > the
> > > unused cores are actually sitting idle.  Any ideas for why this is
> > > happening?  Is there a way to restart Hadoop on the individual slaves?
> > >  Thanks! Mac
> > >
> >
>


Re: Hardware Manufacturer

2009-07-14 Thread Chris Collins
I had relatively good luck with HP but incredibly bad times with  
Dell.  Numerous companies I have worked for have had dell and nothing  
but trouble to go with it:


- Machines that set on fire
- Sata raid controllers that use a mux and give poor IO (tech support  
was non existent).
- Recent HW blowup took almost a month to replace when it was  
supposedly part of a next day service contract.

- Problems with onboard scsi that loose data.
- Lan controllers that would not function with some of the most common  
network switches


Guess I don't have anything good to say about them.


On Jul 14, 2009, at 3:53 PM, Ted Dunning wrote:


I have used Dell, Silicon Mechanics and HP.

I had very good results with Dell and Silicon Mechanics.

Results with HP were marginal, but that was for database machines with
veritas file systems running on EMC hardware.  I will avoid that  
sort of
machine like the plague in the future due to high cost and LooNG  
install

cycle.

I evaluated Sun, but was never able to justify them since the late  
90's in
spite of my sentimental attachments to their product (I bought my  
first Sun

in 1984).

I have used various off-brand machines with horrifically bad results.

On Tue, Jul 14, 2009 at 2:28 PM, Ryan Smith >wrote:



Kris,

We are using EC2 already, I need information on a hardware mfgr for a
*real-life* cluster now, thanks.  :)





On Tue, Jul 14, 2009 at 5:18 PM, Kris Jirapinyo
wrote:


Why don't you try Amazon EC2? :)

On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith 
wrote:


I'm having problems dealing with my server mfgr atm.  Is there a  
good

mfgr

to go with?

Any advice is helpful, thanks.

-Ryan









--
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)






Re: access Configuration object in Partioner??

2009-07-14 Thread Owen O'Malley
On Tue, Jul 14, 2009 at 5:28 AM, Tom White  wrote:

> Hi Jianmin,
>
> Sorry - I (incorrectly) assumed you were using the old API.
> Partitioners don't yet work with the new API (see
> https://issues.apache.org/jira/browse/MAPREDUCE-565).


MAPREDUCE-565 just went in, so new API partitioners will be fixed in 0.20.1,
which will likely come out next week.

-- Owen


Re: access Configuration object in Partioner??

2009-07-14 Thread Amareshwari Sriramadasu
Your partitioner can implement Configurable (setConf(conf) and 
getConf()). Please see 
org.apache.hadoop.mapreduce.lib.partition.BinaryPartitioner


Thanks
Amareshwari
Jianmin Woo wrote:
Thanks a lot for your information, Tom. 


I am using the org.apache.hadoop.mapreduce.Partitioner in 0.20. It seems that 
the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in 
the futture.
Do you have some suggestions on this?

Thanks,
Jianmin





From: Tom White 
To: common-user@hadoop.apache.org
Sent: Tuesday, July 14, 2009 6:03:34 PM
Subject: Re: access Configuration object in Partioner??

Hi Jianmin,

Partitioner extends JobConfigurable, so you can implement the
configure() method to access the JobConf.

Hope that helps.

Cheers,
Tom

On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote:
  

Hi,

I am considering to implement a Partitioner that needs to access the parameters 
in Configuration of job. However, there is no straightforward way for this 
task. Are there any suggestions?

Thanks,
Jianmin









  
  




RE : Hadoop User Group (Bay Area) - Aug 19th at Yahoo!

2009-07-14 Thread Dekel Tankel
Hi all,

There seems to be people that did not receive this announcement on the 
'general' mailing list, so posting it here as well.

I'd like to remind everyone that Yahoo! is excited to continue hosting the 
monthly Bay Area Hadoop user groups.

The next meeting is scheduled for Aug 19th (recurring the 3rd Wednesday of 
every month) at Yahoo!'s Sunnyvale campus - 701 1st Ave
Building E - Classrooms 9 and 10.

Agenda and event registration details are coming up shortly.

Looking forward to seeing you there!

Dekel



Re: Hardware Manufacturer

2009-07-14 Thread Chris Collins
I think if you ask 100 people you will get 100 unique answers.  I will  
say that my current company we use sun hardware.  They offer a  
"startup essentials" program if your a startup and they can be very  
competitive to even the box shifters.  They run linux and of course  
solaris.  We use cheap commodity hardware options.  We found a couple  
of advantages with these units:


lights out management
can put a lot of memory in even their most basic of boxes (the cheap  
ecc options too).

support is great
seem to be very reliable
do have plenty of "exotic" machine offerings if your application calls  
for them (we stay away mostly from that though)


C


On Jul 14, 2009, at 2:28 PM, Ryan Smith wrote:


Kris,

We are using EC2 already, I need information on a hardware mfgr for a
*real-life* cluster now, thanks.  :)





On Tue, Jul 14, 2009 at 5:18 PM, Kris Jirapinyo
wrote:


Why don't you try Amazon EC2? :)

On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith 
wrote:


I'm having problems dealing with my server mfgr atm.  Is there a  
good

mfgr

to go with?

Any advice is helpful, thanks.

-Ryan





Re: Looking for counterpart of Configure Method

2009-07-14 Thread Jingkei Ly
Aaron answered this question: Mapper implements Closeable, so you should
implement/override #close() in your mapper, and it will be called when the
map task finishes:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Mapper.html

2009/7/14 akhil1988 

>
> Does anyone knows the answer to my question or is there any alternative to
> do
> this?
>
> Thanks,
> Akhil
>
>
>
> akhil1988 wrote:
> >
> > Hi All,
> >
> > Just like method configure in Mapper interface, I am looking for its
> > counterpart that will perform the closing operation for a Map task. For
> > example, in method configure I start an external daemon which is used
> > throughout my map task and when the map task completes, I want to close
> > that daemon. Is there any method that gets called at the end of the Map
> > task and  can be overwritten just like configure class.
> >
> > Thanks,
> > Akhil
> >
>
> --
> View this message in context:
> http://www.nabble.com/Looking-for-counterpart-of-Configure-Method-tp24466923p24483501.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: Why is Spilled Records always equal to Map output records

2009-07-14 Thread Mu Qiao
Thanks. It's clear now. :)

On Wed, Jul 15, 2009 at 11:40 AM, Jothi Padmanabhan
wrote:

> It is true, map writes its output to a memory buffer. But when the map
> process is complete, the contents of this buffer are sorted and spilled to
> the disk so that the Task Tracker running on that node can serve these map
> outputs to the requesting reducers.
>
>
> On 7/15/09 7:59 AM, "Mu Qiao"  wrote:
>
> > Thanks. But when I refer to "Hadoop: The Definitive Guide" chapter 6, I
> find
> > that the map writes its outputs to a memory buffer(not to local disk)
> whose
> > size is controlled by io.sort.mb. Only the buffer reaches its threshold,
> it
> > will spill the outputs to local disk. If that is true, I can't see any
> need
> > for the map to store its outputs to disk if the io.sort.mb is large
> enough.
> >
> > On Wed, Jul 15, 2009 at 12:45 AM, Owen O'Malley  >wrote:
> >
> >> There is no requirement that all of the reduces are running while the
> map
> >> is
> >> running. The dataflow is that the map writes its output to local disk
> and
> >> that the reduces pull the map outputs when they need them. There are
> >> threads
> >> handling sorting and spill of the records to disk, but that doesn't
> remove
> >> the need for the map to store its outputs to disk. (Of course, if there
> is
> >> enough ram, the operating system will have the map outputs in its file
> >> cache
> >> and not need to read from disk.)
> >>
> >> It is an interesting question as to what the changes would need to be to
> >> have the maps push to the reduces, but they would be substantial.
> >>
> >> -- Owen
> >>
> >
> >
>
>


-- 
Best wishes,
Qiao Mu


Re: Why is Spilled Records always equal to Map output records

2009-07-14 Thread Jothi Padmanabhan
It is true, map writes its output to a memory buffer. But when the map
process is complete, the contents of this buffer are sorted and spilled to
the disk so that the Task Tracker running on that node can serve these map
outputs to the requesting reducers.


On 7/15/09 7:59 AM, "Mu Qiao"  wrote:

> Thanks. But when I refer to "Hadoop: The Definitive Guide" chapter 6, I find
> that the map writes its outputs to a memory buffer(not to local disk) whose
> size is controlled by io.sort.mb. Only the buffer reaches its threshold, it
> will spill the outputs to local disk. If that is true, I can't see any need
> for the map to store its outputs to disk if the io.sort.mb is large enough.
> 
> On Wed, Jul 15, 2009 at 12:45 AM, Owen O'Malley wrote:
> 
>> There is no requirement that all of the reduces are running while the map
>> is
>> running. The dataflow is that the map writes its output to local disk and
>> that the reduces pull the map outputs when they need them. There are
>> threads
>> handling sorting and spill of the records to disk, but that doesn't remove
>> the need for the map to store its outputs to disk. (Of course, if there is
>> enough ram, the operating system will have the map outputs in its file
>> cache
>> and not need to read from disk.)
>> 
>> It is an interesting question as to what the changes would need to be to
>> have the maps push to the reduces, but they would be substantial.
>> 
>> -- Owen
>> 
> 
> 



Re: Why is Spilled Records always equal to Map output records

2009-07-14 Thread Mu Qiao
Thanks. But when I refer to "Hadoop: The Definitive Guide" chapter 6, I find
that the map writes its outputs to a memory buffer(not to local disk) whose
size is controlled by io.sort.mb. Only the buffer reaches its threshold, it
will spill the outputs to local disk. If that is true, I can't see any need
for the map to store its outputs to disk if the io.sort.mb is large enough.

On Wed, Jul 15, 2009 at 12:45 AM, Owen O'Malley wrote:

> There is no requirement that all of the reduces are running while the map
> is
> running. The dataflow is that the map writes its output to local disk and
> that the reduces pull the map outputs when they need them. There are
> threads
> handling sorting and spill of the records to disk, but that doesn't remove
> the need for the map to store its outputs to disk. (Of course, if there is
> enough ram, the operating system will have the map outputs in its file
> cache
> and not need to read from disk.)
>
> It is an interesting question as to what the changes would need to be to
> have the maps push to the reduces, but they would be substantial.
>
> -- Owen
>



-- 
Best wishes,
Qiao Mu


Re: hadoop cluster startup error

2009-07-14 Thread Aaron Kimball
I think you might have the commands backwards. hadoop fs -get will
copy from HDFS to the local filesystem. So it assumes that
/usr/divij/wordcount/input is in HDFS and that you want to write it to
local file "file01".

If you want to put data into HDFS, do the opposite: hadoop fs -put
(localfile) (hdfs_dest)
Also note that whereas Linux/Unix fans type "/usr", Hadoop spells your
home directory "/user/divij" (note the "e" in "user").

As for not starting the JobTracker/TaskTracker: can you look through
the JobTracker and TaskTracker logs in your $HADOOP_HOME/logs
directory (should be named hadoop-divij-jobtracker-gobi.log and
hadoop-divij-tasktracker-gobi.log) and reply to this thread with any
ERROR or FATAL messages present in either?

Thanks,
- Aaron


On Tue, Jul 14, 2009 at 12:47 PM, Divij Durve wrote:
> I have started the cluster and inspite of that i dont get any of the
> processes when i type ps. the only thing its showing is 2 java processes on
> the name node.
>
> im trying to do the word count example but this is what is happening:
>
> [di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input ./file01
> [di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input ./file02
> [di...@gobi bin]$ ./hadoop fs -ls
> ls: Cannot access .: No such file or directory.
> [di...@gobi bin]$ ./hadoop fs -ls /usr/divij/wordcount/input
> [di...@gobi bin]$ ./hadoop dfs -cat >/usr/divij/wordcount/input/file01
> -bash: /usr/divij/wordcount/input/file01: No such file or directory
> [di...@gobi bin]$ ./hadoop dfs -cat /usr/divij/wordcount/input/file01
> cat: File does not exist: /usr/divij/wordcount/input/file01
> [di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input/ ./file01
> [di...@gobi bin]$ ./hadoop fs -cat /usr/divij/wordcount/input/file01
> cat: File does not exist: /usr/divij/wordcount/input/file01
> [di...@gobi bin]$ ./hadoop dfs -cat /usr/divij/wordcount/input/file01
> cat: File does not exist: /usr/divij/wordcount/input/file01
> [di...@gobi bin]$ ./hadoop fs -test -e /usr/divij/wordcount/input/file01
> [di...@gobi bin]$ ./hadoop fs -tail /usr/divij/wordcount/input/file01
> tail: File does not exist: /usr/divij/wordcount/input/file01
> [di...@gobi bin]$
>
> i really dont know whats wrong. can anyone help?
>
>
> On Mon, Jul 13, 2009 at 6:15 PM, Aaron Kimball  wrote:
>
>> I'm not convinced anything is wrong with the TaskTracker. Can you run jobs?
>> Does the pi example work? If not, what error does it give?
>>
>> If you're trying to configure your SecondaryNameNode on a different host
>> than your NameNode, you'll need to do some configuration tweaking. I wrote
>> a
>> blog post with instructions on how to do this, here:
>>
>> http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
>>
>> Good luck!
>> - Aaron
>>
>> On Mon, Jul 13, 2009 at 3:00 PM, Divij Durve  wrote:
>>
>> > here is the log file from the secondary namenode:
>> >
>> > 2009-07-13 17:36:13,274 INFO
>> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG:
>> > /
>> > STARTUP_MSG: Starting SecondaryNameNode
>> > STARTUP_MSG:   host = kalahari.mssm.edu/127.0.0.1
>> > STARTUP_MSG:   args = []
>> > STARTUP_MSG:   version = 0.19.0
>> > STARTUP_MSG:   build =
>> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
>> > 713890;
>> > compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
>> > /
>> > 2009-07-13 17:36:13,350 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
>> > Initializing JVM Metrics with processName=SecondaryNameNode,
>> sessionId=null
>> > 2009-07-13 17:36:13,445 INFO
>> org.apache.hadoop.hdfs.server.common.Storage:
>> > Recovering storage directory
>> > /home/divij/hive/build/hadoopcore/hadoop-0.19.0/hadoo
>> > p-tmp/dfs/namesecondary from failed checkpoint.
>> > 2009-07-13 17:36:13,516 INFO org.mortbay.http.HttpServer: Version
>> > Jetty/5.1.4
>> > 2009-07-13 17:36:13,522 INFO org.mortbay.util.Credential: Checking
>> Resource
>> > aliases
>> > 2009-07-13 17:36:13,729 INFO org.mortbay.util.Container: Started
>> > org.mortbay.jetty.servlet.webapplicationhand...@6602e323
>> > 2009-07-13 17:36:13,772 INFO org.mortbay.util.Container: Started
>> > WebApplicationContext[/static,/static]
>> > 2009-07-13 17:36:13,855 INFO org.mortbay.util.Container: Started
>> > org.mortbay.jetty.servlet.webapplicationhand...@77546dbc
>> > 2009-07-13 17:36:13,866 INFO org.mortbay.util.Container: Started
>> > WebApplicationContext[/logs,/logs]
>> > 2009-07-13 17:36:13,945 INFO org.mortbay.jetty.servlet.XMLConfiguration:
>> No
>> > WEB-INF/web.xml in
>> > file:/home/divij/hive/build/hadoopcore/hadoop-0.19.0/webapps/secondary.
>> > Serving files and default/dynamic servlets only
>> > 2009-07-13 17:36:13,946 INFO org.mortbay.util.Container: Started
>> > org.mortbay.jetty.servlet.webapplicationhand...@52c00025
>> > 2009-07-13 17:36:13,960 INFO org.mortbay.util.Container: Start

Re: access Configuration object in Partioner??

2009-07-14 Thread Jianmin Woo
OK, I see. Thanks a lot for the help, Tom.

Best Regards,
Jianmin





From: Tom White 
To: common-user@hadoop.apache.org
Sent: Tuesday, July 14, 2009 8:28:55 PM
Subject: Re: access Configuration object in Partioner??

Hi Jianmin,

Sorry - I (incorrectly) assumed you were using the old API.
Partitioners don't yet work with the new API (see
https://issues.apache.org/jira/browse/MAPREDUCE-565). However, when
they do you can make your Partitioner implement Configurable (by
extending Configured, for example), and this will give you access to
the job configuration, since the framework will set it for you on the
partitioner.

Cheers
Tom

On Tue, Jul 14, 2009 at 12:46 PM, Jianmin Woo wrote:
> Thanks a lot for your information, Tom.
>
> I am using the org.apache.hadoop.mapreduce.Partitioner in 0.20. It seems that 
> the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in 
> the futture.
> Do you have some suggestions on this?
>
> Thanks,
> Jianmin
>
>
>
>
> 
> From: Tom White 
> To: common-user@hadoop.apache.org
> Sent: Tuesday, July 14, 2009 6:03:34 PM
> Subject: Re: access Configuration object in Partioner??
>
> Hi Jianmin,
>
> Partitioner extends JobConfigurable, so you can implement the
> configure() method to access the JobConf.
>
> Hope that helps.
>
> Cheers,
> Tom
>
> On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote:
>> Hi,
>>
>> I am considering to implement a Partitioner that needs to access the 
>> parameters in Configuration of job. However, there is no straightforward way 
>> for this task. Are there any suggestions?
>>
>> Thanks,
>> Jianmin
>>
>>
>>
>>
>
>
>
>



  

Re: Cluster configuration issue

2009-07-14 Thread Mac Yuki
Thanks for the help!  That bug looks like the one I'm running into and the
code in my src/ directory matches the older one on the patch svn diff.
 Here's a beginner question: how do I apply just that patch to my
installation?  Thanks.Mac


On Tue, Jul 14, 2009 at 10:03 AM, Tamir Kamara wrote:

> Hi,
>
> To restart hadoop on a specific node use a command like this:
> "hadoop-daemon.sh stop tasktracker" and after that the same command with
> start. You can also do the same with the datanode but it doesn't look like
> there's a problem there.
>
> I had the same problem with missing slots but I don't remeber if it
> happened
> on maps. The fix in my case was this patch:
> https://issues.apache.org/jira/browse/HADOOP-5269
>
>
> Tamir
>
> On Tue, Jul 14, 2009 at 7:25 PM, KTM  wrote:
>
> > Hi, I'm running Hadoop 0.19.1 on a cluster with 8 machines, 7 of which
> are
> > used as slaves and the other the master, each with 2 dual-core AMD CPUs
> and
> > generous amounts of RAM.  I am running map-only jobs and have the slaves
> > set
> > up to have 4 mappers each, for a total of 28 available mappers.  When I
> > first start up my cluster, I am able to use all 28 mappers.  However,
> after
> > a short bit of time (~12 hours), jobs that I submit start using fewer
> > mappers.  I restarted my cluster last night, and currently only 19
> mappers
> > are running tasks even though more tasks are pending, with at least 2
> tasks
> > running per machine - so no machine has gone down.  I have checked that
> the
> > unused cores are actually sitting idle.  Any ideas for why this is
> > happening?  Is there a way to restart Hadoop on the individual slaves?
> >  Thanks! Mac
> >
>


Re: August Bay Area HUG

2009-07-14 Thread Christophe Bisciglia
Hey, we got the location details worked out with Yahoo! and have room
for up to 100 people.

This will be at the *Sunnyvale* campus - not the Santa Clara campus
many of you are used to. There was some confusion between us,
magnified by the changes to the mailing lists. Rest assured, this is
the *same* event.

Address is posted here:
http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG/calendar/10877366/

Christophe

On Tue, Jul 14, 2009 at 4:23 PM, Christophe
Bisciglia wrote:
> Hadoop Fans,
>
> We're looking forward to seeing many of you at tomorrows meetup -
> especially folks from San Francisco and the North Bay.
>
> Next month, we're going to host the meetup in the South Bay. We'll
> alternate locations to make it easy for everyone to attend
> occasionally, regardless of where you spend your days.
>
> We're still working out details for the location, but it will be in
> the PA / MV / Sunnyvale / Santa Clara area. We'll update the meetup
> page once this is worked out, but if you RSVP now, we'll ensure we
> have enough capacity.
>
> August 19th, 6PM
> http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG/calendar/10877366/
>
> If you'd like to present, see the meetup page for details. We'll post
> the agenda by the end of the month.
>
> Cheers,
> Christophe
>
> --
> get hadoop: cloudera.com/hadoop
> online training: cloudera.com/hadoop-training
> blog: cloudera.com/blog
> twitter: twitter.com/cloudera
>



-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera


August Bay Area HUG

2009-07-14 Thread Christophe Bisciglia
Hadoop Fans,

We're looking forward to seeing many of you at tomorrows meetup -
especially folks from San Francisco and the North Bay.

Next month, we're going to host the meetup in the South Bay. We'll
alternate locations to make it easy for everyone to attend
occasionally, regardless of where you spend your days.

We're still working out details for the location, but it will be in
the PA / MV / Sunnyvale / Santa Clara area. We'll update the meetup
page once this is worked out, but if you RSVP now, we'll ensure we
have enough capacity.

August 19th, 6PM
http://www.meetup.com/Bay-Area-Hadoop-User-Group-HUG/calendar/10877366/

If you'd like to present, see the meetup page for details. We'll post
the agenda by the end of the month.

Cheers,
Christophe

-- 
get hadoop: cloudera.com/hadoop
online training: cloudera.com/hadoop-training
blog: cloudera.com/blog
twitter: twitter.com/cloudera


Re: Hardware Manufacturer

2009-07-14 Thread Ted Dunning
I have used Dell, Silicon Mechanics and HP.

I had very good results with Dell and Silicon Mechanics.

Results with HP were marginal, but that was for database machines with
veritas file systems running on EMC hardware.  I will avoid that sort of
machine like the plague in the future due to high cost and LooNG install
cycle.

I evaluated Sun, but was never able to justify them since the late 90's in
spite of my sentimental attachments to their product (I bought my first Sun
in 1984).

I have used various off-brand machines with horrifically bad results.

On Tue, Jul 14, 2009 at 2:28 PM, Ryan Smith wrote:

> Kris,
>
> We are using EC2 already, I need information on a hardware mfgr for a
> *real-life* cluster now, thanks.  :)
>
>
>
>
>
> On Tue, Jul 14, 2009 at 5:18 PM, Kris Jirapinyo
> wrote:
>
> > Why don't you try Amazon EC2? :)
> >
> > On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith  > >wrote:
> >
> > > I'm having problems dealing with my server mfgr atm.  Is there a good
> > mfgr
> > > to go with?
> > >
> > > Any advice is helpful, thanks.
> > >
> > > -Ryan
> > >
> >
>



-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


Re: map side Vs. Reduce side join

2009-07-14 Thread Owen O'Malley
Map-side join is almost always more efficient, but only handles some cases.
Reduce side joins always work, but require a complete map/reduce job to get
the join.

-- Owen


Re: Hardware Manufacturer

2009-07-14 Thread Chris Collins
I think if you ask 100 people you will get 100 unique answers.  I will  
say that my current company we use sun hardware.  They offer a  
"startup essentials" program if your a startup and they can be very  
competitive to even the box shifters.  They run linux and of course  
solaris.  We use cheap commodity hardware options.  We found a couple  
of advantages with these units:


lights out management
can put a lot of memory in even their most basic of boxes (the cheap  
ecc options too).

support is great
seem to be very reliable
do have plenty of "exotic" machine offerings if your application calls  
for them (we stay away mostly from that though)


C

On Jul 14, 2009, at 2:12 PM, Ryan Smith wrote:

I'm having problems dealing with my server mfgr atm.  Is there a  
good mfgr

to go with?

Any advice is helpful, thanks.

-Ryan






Re: map side Vs. Reduce side join

2009-07-14 Thread Alan Gates
Usually doing a join on the map side depends on exploiting some  
characteristic of the data (such as one input is small enough it can  
fit in memory and be replicated to every map, or both inputs are  
already sorted on the same key, or both inputs are already partitioned  
into same number of partitions using an identical hash function).  If  
one of those is not true, then you have to preprocess.  This  
preprocess is equivalent to doing your join on the reduce side using  
Hadoop to group your keys.


As a side note, Pig and Hive both already implement join in Map and  
Reduce phases, so you could take a look at how those are implemented  
and when they recommend choosing each.


Alan.

On Jul 14, 2009, at 1:49 PM, bonito wrote:



Hello.
I would like to ask if there is any 'scenario' in which the reduce  
side join

is preferable than the map-side join.
One may claim that map side join requires preprocessing of the input
sources.
Is there any other reason or reasons?
I am really interested in this.
Thank you.
--
View this message in context: 
http://www.nabble.com/map-side-Vs.-Reduce-side-join-tp24487391p24487391.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.





Re: Hardware Manufacturer:1

2009-07-14 Thread Todd Troxell
On Tue, Jul 14, 2009 at 05:12:31PM -0400, Ryan Smith wrote:
> I'm having problems dealing with my server mfgr atm.  Is there a good mfgr
> to go with?
> 
> Any advice is helpful, thanks.

I've had luck with Silicon Mechanics[0], though lately their disks are
shipping very slowly-- curious about what others would recommend.

[0] http://www.siliconmechanics.com/

-- 
Todd Troxell
http://rapidpacket.com/~xtat


Re: Hardware Manufacturer

2009-07-14 Thread Ryan Smith
Kris,

We are using EC2 already, I need information on a hardware mfgr for a
*real-life* cluster now, thanks.  :)





On Tue, Jul 14, 2009 at 5:18 PM, Kris Jirapinyo
wrote:

> Why don't you try Amazon EC2? :)
>
> On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith  >wrote:
>
> > I'm having problems dealing with my server mfgr atm.  Is there a good
> mfgr
> > to go with?
> >
> > Any advice is helpful, thanks.
> >
> > -Ryan
> >
>


Re: Hardware Manufacturer

2009-07-14 Thread Kris Jirapinyo
Why don't you try Amazon EC2? :)

On Tue, Jul 14, 2009 at 2:12 PM, Ryan Smith wrote:

> I'm having problems dealing with my server mfgr atm.  Is there a good mfgr
> to go with?
>
> Any advice is helpful, thanks.
>
> -Ryan
>


Hardware Manufacturer

2009-07-14 Thread Ryan Smith
I'm having problems dealing with my server mfgr atm.  Is there a good mfgr
to go with?

Any advice is helpful, thanks.

-Ryan


Question about job distribution

2009-07-14 Thread Divij Durve
If i have a query that i would normally fire on a database and i want o fire
that using the data loaded into multiple nodes on hadoop. Will the query be
distributed over all the datanodes so it returns results faster or will it
just send it to 1 node? If so is there a way to get it to distribute the
query instead of sending it to 1 node?
Thanks
Divij


Re: .gz as input files in streaming.

2009-07-14 Thread Miles Osborne
here is a part of a shell script i wrote which deals with compressed input
and produces compressed output (for streaming)

>
hadoop dfs -rmr $4
hadoop jar /usr/local/share/hadoop/contrib/streaming/hadoop-*-streaming.jar
-mapper $1 -reducer $2 -input $3/* -output
 $4 -file $1 -file $2 -jobconf mapred.job.name="$5"   -jobconf
stream.recordreader.compression=gzip \
-jobconf mapred.output.compress=true \
-jobconf
mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCodec


2009/7/14 Dmitry Pushkarev 

> Dear hadoop users,
>
>
>
> Sorry for probably very common question, but is there a way to process
> folder with .gz files with streaming?
>
> In manual they only describe how to create GZIPped output, but I couldn't
> figure out how to use GZIPped files for input.
>
>
>
> Right now I create a list of these files and process them like "hadoop -cat
> $file |gzip -dc |" but that doesn't use data-locality of archives (each
> file
> is 64MB - exactly one block).
>
>
>
> A sample code or link to manual would be greatly appreciated.
>
>


-- 
The University of Edinburgh is a charitable body, registered in Scotland,
with registration number SC005336.


map side Vs. Reduce side join

2009-07-14 Thread bonito

Hello.
I would like to ask if there is any 'scenario' in which the reduce side join
is preferable than the map-side join.
One may claim that map side join requires preprocessing of the input
sources. 
Is there any other reason or reasons?
I am really interested in this.
Thank you.
-- 
View this message in context: 
http://www.nabble.com/map-side-Vs.-Reduce-side-join-tp24487391p24487391.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



.gz as input files in streaming.

2009-07-14 Thread Dmitry Pushkarev
Dear hadoop users, 

 

Sorry for probably very common question, but is there a way to process
folder with .gz files with streaming?

In manual they only describe how to create GZIPped output, but I couldn't
figure out how to use GZIPped files for input.

 

Right now I create a list of these files and process them like "hadoop -cat
$file |gzip -dc |" but that doesn't use data-locality of archives (each file
is 64MB - exactly one block).

 

A sample code or link to manual would be greatly appreciated. 



Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Ryan Smith
Thanks Matt.  Good to know.  Both my system and jvm are 32bit btw.

I was able to build libhdfs and package it after adding the one patch.
Thanks a lot.  I didnt even check to see if the issue was resolved.
One thing that is different on 0.20.0 is libhdfs isnt right off the
hadoop_home dir anymore.  Instead of the libs existing in
$HADOOP_HOME/libhdfs  they are in
$HADOOP_HOME/build/hadoop-X.Y.Z/c++/Linux-ARCH/lib

-Ryan


On Tue, Jul 14, 2009 at 4:06 PM, Matt Massie  wrote:

> Something to keep in mind too is that the native build determines whether
> you are running on a 32-bit or 64-bit platform by which version of Java you
> have installed.  If you have 32-bit Java installed on a 64-bit platform, it
> will build 32-bit libraries (instead of the expected 64-bit version).
>
> -Matt
>
>
> On Jul 14, 2009, at 1:00 PM, Todd Lipcon wrote:
>
>  On Tue, Jul 14, 2009 at 12:58 PM, Ryan Smith > >wrote:
>>
>>  Todd,
>>>
>>> Ill try it on Fedora, thanks again.
>>>
>>>
>> You can also apply the patch from HADOOP-5611 - it should apply cleanly.
>> Just download it and patch -p0 < HADOOP-5611.patch
>>
>> -Todd
>>
>>
>>
>>> -Ryan
>>>
>>> On Tue, Jul 14, 2009 at 3:55 PM, Todd Lipcon  wrote:
>>>
>>>  Hi Ryan,

 Sounds like HADOOP-5611:

>>> https://issues.apache.org/jira/browse/HADOOP-5611
>>>

 -Todd

 On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith <

>>> ryan.justin.sm...@gmail.com
>>>
 wrote:
>

  Hello,
>
> My problem was I didnt have g++ installed.  :)  So i installed g++ and
>
 I
>>>
 re-ran :
>
> ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
>
> and now I get a strange error on utils.  Any ideas  on this one?  Looks
> like
> its 64 bit related.
> My os is ubuntu 32 bit.
> $ uname -a
> Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59
>
 UTC
>>>
 2009 i686 GNU/Linux
>
> Also, i tried building without -Dcompile.c++=1 and it didnt compile
>
 libhdfs

> so it seems libhdfs needs it.  Ill be digging thru the build.xml more
>
 to
>>>
 figure more out.
> Thanks again.
> -Ryan
>
>
> -
>
> compile-hdfs-classes:
>  [javac] Compiling 4 source files to
> /home/rsmith/hadoop-0.20.0/build/classes
>
> compile-core-native:
>
> check-c++-makefiles:
>
> create-c++-pipes-makefile:
>
> create-c++-utils-makefile:
>
> compile-c++-utils:
>   [exec] depbase=`echo impl/StringUtils.o | sed
> 's|[^/]*$|.deps/&|;s|\.o$||'`; \
>   [exec] if g++ -DHAVE_CONFIG_H -I.
> -I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
> -I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
> impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/StringUtils.o
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
>   [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
> "$depbase.Tpo"; exit 1; fi
>   [exec]
>
 /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
>>>
 In

> function ‘uint64_t HadoopUtils::getCurrentMillis()’:
>   [exec]
>
 /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:

> error: ‘strerror’ was not declared in this scope
>   [exec]
>
 /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
>>>
 In

> function ‘std::string HadoopUtils::quoteString(const std::string&,
>
 const
>>>
 char*)’:
>   [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103:
>
 error:
>>>
 ‘strchr’ was not declared in this scope
>   [exec]
>
 /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
>>>
 In

> function ‘std::string HadoopUtils::unquoteString(const std::string&)’:
>   [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144:
>
 error:
>>>
 ‘strtol’ was not declared in this scope
>   [exec] make: *** [impl/StringUtils.o] Error 1
>
> BUILD FAILED
> /home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2
>
> Total time: 5 seconds
>
> --
>
>
>
>
>
>
>
> On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon 
>
 wrote:
>>>

>  Hi Ryan,
>>
>> I've never seen that issue. It sounds to me like your C installation
>>
> is
>>>
 screwy - what OS are you running?
>>
>> I *think* (but am not certain) that if you leave out -Dcompile.c++=1
>>
> and

> just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.
>>
>> -Todd
>>
>> On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith <
>>
> ryan.justin.sm...@gmail.com
>
>> wrote:
>>>
>>
>>  Hi Todd.
>>>
>>> Thanks, that was 

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Matt Massie
Something to keep in mind too is that the native build determines  
whether you are running on a 32-bit or 64-bit platform by which  
version of Java you have installed.  If you have 32-bit Java installed  
on a 64-bit platform, it will build 32-bit libraries (instead of the  
expected 64-bit version).


-Matt

On Jul 14, 2009, at 1:00 PM, Todd Lipcon wrote:

On Tue, Jul 14, 2009 at 12:58 PM, Ryan Smith >wrote:



Todd,

Ill try it on Fedora, thanks again.



You can also apply the patch from HADOOP-5611 - it should apply  
cleanly.

Just download it and patch -p0 < HADOOP-5611.patch

-Todd




-Ryan

On Tue, Jul 14, 2009 at 3:55 PM, Todd Lipcon   
wrote:



Hi Ryan,

Sounds like HADOOP-5611:

https://issues.apache.org/jira/browse/HADOOP-5611


-Todd

On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith <

ryan.justin.sm...@gmail.com

wrote:



Hello,

My problem was I didnt have g++ installed.  :)  So i installed g+ 
+ and

I

re-ran :

ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1

and now I get a strange error on utils.  Any ideas  on this one?   
Looks

like
its 64 bit related.
My os is ubuntu 32 bit.
$ uname -a
Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17  
01:57:59

UTC

2009 i686 GNU/Linux

Also, i tried building without -Dcompile.c++=1 and it didnt compile

libhdfs
so it seems libhdfs needs it.  Ill be digging thru the build.xml  
more

to

figure more out.
Thanks again.
-Ryan


-

compile-hdfs-classes:
  [javac] Compiling 4 source files to
/home/rsmith/hadoop-0.20.0/build/classes

compile-core-native:

check-c++-makefiles:

create-c++-pipes-makefile:

create-c++-utils-makefile:

compile-c++-utils:
   [exec] depbase=`echo impl/StringUtils.o | sed
's|[^/]*$|.deps/&|;s|\.o$||'`; \
   [exec] if g++ -DHAVE_CONFIG_H -I.
-I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
-I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/ 
StringUtils.o

/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
   [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
"$depbase.Tpo"; exit 1; fi
   [exec]

/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:

In

function ‘uint64_t HadoopUtils::getCurrentMillis()’:
   [exec]

/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:

error: ‘strerror’ was not declared in this scope
   [exec]

/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:

In

function ‘std::string HadoopUtils::quoteString(const std::string&,

const

char*)’:
   [exec]
/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103:

error:

‘strchr’ was not declared in this scope
   [exec]

/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:

In
function ‘std::string HadoopUtils::unquoteString(const  
std::string&)’:

   [exec]
/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144:

error:

‘strtol’ was not declared in this scope
   [exec] make: *** [impl/StringUtils.o] Error 1

BUILD FAILED
/home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2

Total time: 5 seconds

--







On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon 

wrote:



Hi Ryan,

I've never seen that issue. It sounds to me like your C  
installation

is

screwy - what OS are you running?

I *think* (but am not certain) that if you leave out -Dcompile.c+ 
+=1

and

just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.

-Todd

On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith <

ryan.justin.sm...@gmail.com

wrote:



Hi Todd.

Thanks, that was it.  Now I got an error during the configure

script.

Any

ideas?

[exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity

check


Please see full details below.

http://pastebin.com/m79ff6ef7

Also, is there a way to avoid building pipes? I just need  
libhdfs.

Thanks.

-Ryan



On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon 

wrote:



Hi Ryan,

To fix this you can simply chmod 755 that configure script

referenced

in

the
error.

There is a JIRA for this that I think got committed that adds

another

 task to build.xml, but it may not be in 0.20.0.

Thanks
-Todd

On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <

ryan.justin.sm...@gmail.com

wrote:



I run ant clean then :
ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1

Below is the error output.

Is this the right way to build libhdfs?

0.18.3 and 0.19.1 builds libhdfs for me just fine.





record-parser:

compile-rcc-compiler:

compile-core-classes:
  [javac] Compiling 1 source file to

/home/jim/hadoop-0.20.0/build/classes


compile-mapred-classes:
  [javac] Compiling 1 source file to

/home/jim/hadoop-0.20.0/build/classes


compile-hdfs-classes:
  [javac] Compiling 4 source files to
/home/jim/hadoop-0.20.0/build/classes

compile-core-native:

check-c++-makefiles:

create-c++-pipes-makefile:

BUILD FAILED
/home/jim/hadoop-0.20.0/build.xml:1414: Execute failed:
java

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Todd Lipcon
On Tue, Jul 14, 2009 at 12:58 PM, Ryan Smith wrote:

> Todd,
>
> Ill try it on Fedora, thanks again.
>

You can also apply the patch from HADOOP-5611 - it should apply cleanly.
Just download it and patch -p0 < HADOOP-5611.patch

-Todd


>
> -Ryan
>
> On Tue, Jul 14, 2009 at 3:55 PM, Todd Lipcon  wrote:
>
> > Hi Ryan,
> >
> > Sounds like HADOOP-5611:
> https://issues.apache.org/jira/browse/HADOOP-5611
> >
> > -Todd
> >
> > On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith <
> ryan.justin.sm...@gmail.com
> > >wrote:
> >
> > > Hello,
> > >
> > > My problem was I didnt have g++ installed.  :)  So i installed g++ and
> I
> > > re-ran :
> > >
> > >  ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> > >
> > > and now I get a strange error on utils.  Any ideas  on this one?  Looks
> > > like
> > > its 64 bit related.
> > > My os is ubuntu 32 bit.
> > > $ uname -a
> > > Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59
> UTC
> > > 2009 i686 GNU/Linux
> > >
> > > Also, i tried building without -Dcompile.c++=1 and it didnt compile
> > libhdfs
> > > so it seems libhdfs needs it.  Ill be digging thru the build.xml more
> to
> > > figure more out.
> > > Thanks again.
> > > -Ryan
> > >
> > >
> > > -
> > >
> > > compile-hdfs-classes:
> > >[javac] Compiling 4 source files to
> > > /home/rsmith/hadoop-0.20.0/build/classes
> > >
> > > compile-core-native:
> > >
> > > check-c++-makefiles:
> > >
> > > create-c++-pipes-makefile:
> > >
> > > create-c++-utils-makefile:
> > >
> > > compile-c++-utils:
> > > [exec] depbase=`echo impl/StringUtils.o | sed
> > > 's|[^/]*$|.deps/&|;s|\.o$||'`; \
> > > [exec] if g++ -DHAVE_CONFIG_H -I.
> > > -I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
> > > -I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
> > > impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/StringUtils.o
> > > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
> > > [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
> > > "$depbase.Tpo"; exit 1; fi
> > > [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> > In
> > > function ‘uint64_t HadoopUtils::getCurrentMillis()’:
> > > [exec]
> > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:
> > > error: ‘strerror’ was not declared in this scope
> > > [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> > In
> > > function ‘std::string HadoopUtils::quoteString(const std::string&,
> const
> > > char*)’:
> > > [exec]
> > > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103:
> error:
> > > ‘strchr’ was not declared in this scope
> > > [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> > In
> > > function ‘std::string HadoopUtils::unquoteString(const std::string&)’:
> > > [exec]
> > > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144:
> error:
> > > ‘strtol’ was not declared in this scope
> > > [exec] make: *** [impl/StringUtils.o] Error 1
> > >
> > > BUILD FAILED
> > > /home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2
> > >
> > > Total time: 5 seconds
> > >
> > > --
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon 
> wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > I've never seen that issue. It sounds to me like your C installation
> is
> > > > screwy - what OS are you running?
> > > >
> > > > I *think* (but am not certain) that if you leave out -Dcompile.c++=1
> > and
> > > > just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.
> > > >
> > > > -Todd
> > > >
> > > > On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith <
> > > ryan.justin.sm...@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Todd.
> > > > >
> > > > > Thanks, that was it.  Now I got an error during the configure
> script.
> > > >  Any
> > > > > ideas?
> > > > >
> > > > >  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity
> > > check
> > > > >
> > > > > Please see full details below.
> > > > >
> > > > > http://pastebin.com/m79ff6ef7
> > > > >
> > > > > Also, is there a way to avoid building pipes? I just need libhdfs.
> > > >  Thanks.
> > > > > -Ryan
> > > > >
> > > > >
> > > > >
> > > > > On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon 
> > > wrote:
> > > > >
> > > > > > Hi Ryan,
> > > > > >
> > > > > > To fix this you can simply chmod 755 that configure script
> > referenced
> > > > in
> > > > > > the
> > > > > > error.
> > > > > >
> > > > > > There is a JIRA for this that I think got committed that adds
> > another
> > > > > >  task to build.xml, but it may not be in 0.20.0.
> > > > > >
> > > > > > Thanks
> > > > > > -Todd
> > > > > >
> > > > > > On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <
> > > > > ryan.justin.sm...@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > I run ant clean then :
> > > > > > > ant compile-contrib 

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Ryan Smith
Todd,

Ill try it on Fedora, thanks again.

-Ryan

On Tue, Jul 14, 2009 at 3:55 PM, Todd Lipcon  wrote:

> Hi Ryan,
>
> Sounds like HADOOP-5611: https://issues.apache.org/jira/browse/HADOOP-5611
>
> -Todd
>
> On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith  >wrote:
>
> > Hello,
> >
> > My problem was I didnt have g++ installed.  :)  So i installed g++ and I
> > re-ran :
> >
> >  ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> >
> > and now I get a strange error on utils.  Any ideas  on this one?  Looks
> > like
> > its 64 bit related.
> > My os is ubuntu 32 bit.
> > $ uname -a
> > Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC
> > 2009 i686 GNU/Linux
> >
> > Also, i tried building without -Dcompile.c++=1 and it didnt compile
> libhdfs
> > so it seems libhdfs needs it.  Ill be digging thru the build.xml more to
> > figure more out.
> > Thanks again.
> > -Ryan
> >
> >
> > -
> >
> > compile-hdfs-classes:
> >[javac] Compiling 4 source files to
> > /home/rsmith/hadoop-0.20.0/build/classes
> >
> > compile-core-native:
> >
> > check-c++-makefiles:
> >
> > create-c++-pipes-makefile:
> >
> > create-c++-utils-makefile:
> >
> > compile-c++-utils:
> > [exec] depbase=`echo impl/StringUtils.o | sed
> > 's|[^/]*$|.deps/&|;s|\.o$||'`; \
> > [exec] if g++ -DHAVE_CONFIG_H -I.
> > -I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
> > -I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
> > impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/StringUtils.o
> > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
> > [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
> > "$depbase.Tpo"; exit 1; fi
> > [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> In
> > function ‘uint64_t HadoopUtils::getCurrentMillis()’:
> > [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:
> > error: ‘strerror’ was not declared in this scope
> > [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> In
> > function ‘std::string HadoopUtils::quoteString(const std::string&, const
> > char*)’:
> > [exec]
> > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103: error:
> > ‘strchr’ was not declared in this scope
> > [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:
> In
> > function ‘std::string HadoopUtils::unquoteString(const std::string&)’:
> > [exec]
> > /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144: error:
> > ‘strtol’ was not declared in this scope
> > [exec] make: *** [impl/StringUtils.o] Error 1
> >
> > BUILD FAILED
> > /home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2
> >
> > Total time: 5 seconds
> >
> > --
> >
> >
> >
> >
> >
> >
> >
> > On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon  wrote:
> >
> > > Hi Ryan,
> > >
> > > I've never seen that issue. It sounds to me like your C installation is
> > > screwy - what OS are you running?
> > >
> > > I *think* (but am not certain) that if you leave out -Dcompile.c++=1
> and
> > > just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.
> > >
> > > -Todd
> > >
> > > On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith <
> > ryan.justin.sm...@gmail.com
> > > >wrote:
> > >
> > > > Hi Todd.
> > > >
> > > > Thanks, that was it.  Now I got an error during the configure script.
> > >  Any
> > > > ideas?
> > > >
> > > >  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity
> > check
> > > >
> > > > Please see full details below.
> > > >
> > > > http://pastebin.com/m79ff6ef7
> > > >
> > > > Also, is there a way to avoid building pipes? I just need libhdfs.
> > >  Thanks.
> > > > -Ryan
> > > >
> > > >
> > > >
> > > > On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon 
> > wrote:
> > > >
> > > > > Hi Ryan,
> > > > >
> > > > > To fix this you can simply chmod 755 that configure script
> referenced
> > > in
> > > > > the
> > > > > error.
> > > > >
> > > > > There is a JIRA for this that I think got committed that adds
> another
> > > > >  task to build.xml, but it may not be in 0.20.0.
> > > > >
> > > > > Thanks
> > > > > -Todd
> > > > >
> > > > > On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <
> > > > ryan.justin.sm...@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > I run ant clean then :
> > > > > > ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> > > > > >
> > > > > > Below is the error output.
> > > > > >
> > > > > > Is this the right way to build libhdfs?
> > > > > >
> > > > > > 0.18.3 and 0.19.1 builds libhdfs for me just fine.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > record-parser:
> > > > > >
> > > > > > compile-rcc-compiler:
> > > > > >
> > > > > > compile-core-classes:
> > > > > >[javac] Compiling 1 source file to
> > > > > /home/jim/hadoop-0.20.0/build/classes
> > > > > >
> > > > > > compile-mapred-classes:
> > > > > >[j

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Todd Lipcon
Hi Ryan,

Sounds like HADOOP-5611: https://issues.apache.org/jira/browse/HADOOP-5611

-Todd

On Tue, Jul 14, 2009 at 12:49 PM, Ryan Smith wrote:

> Hello,
>
> My problem was I didnt have g++ installed.  :)  So i installed g++ and I
> re-ran :
>
>  ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
>
> and now I get a strange error on utils.  Any ideas  on this one?  Looks
> like
> its 64 bit related.
> My os is ubuntu 32 bit.
> $ uname -a
> Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC
> 2009 i686 GNU/Linux
>
> Also, i tried building without -Dcompile.c++=1 and it didnt compile libhdfs
> so it seems libhdfs needs it.  Ill be digging thru the build.xml more to
> figure more out.
> Thanks again.
> -Ryan
>
>
> -
>
> compile-hdfs-classes:
>[javac] Compiling 4 source files to
> /home/rsmith/hadoop-0.20.0/build/classes
>
> compile-core-native:
>
> check-c++-makefiles:
>
> create-c++-pipes-makefile:
>
> create-c++-utils-makefile:
>
> compile-c++-utils:
> [exec] depbase=`echo impl/StringUtils.o | sed
> 's|[^/]*$|.deps/&|;s|\.o$||'`; \
> [exec] if g++ -DHAVE_CONFIG_H -I.
> -I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
> -I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
> impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/StringUtils.o
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
> [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
> "$depbase.Tpo"; exit 1; fi
> [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
> function ‘uint64_t HadoopUtils::getCurrentMillis()’:
> [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:
> error: ‘strerror’ was not declared in this scope
> [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
> function ‘std::string HadoopUtils::quoteString(const std::string&, const
> char*)’:
> [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103: error:
> ‘strchr’ was not declared in this scope
> [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
> function ‘std::string HadoopUtils::unquoteString(const std::string&)’:
> [exec]
> /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144: error:
> ‘strtol’ was not declared in this scope
> [exec] make: *** [impl/StringUtils.o] Error 1
>
> BUILD FAILED
> /home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2
>
> Total time: 5 seconds
>
> --
>
>
>
>
>
>
>
> On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon  wrote:
>
> > Hi Ryan,
> >
> > I've never seen that issue. It sounds to me like your C installation is
> > screwy - what OS are you running?
> >
> > I *think* (but am not certain) that if you leave out -Dcompile.c++=1 and
> > just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.
> >
> > -Todd
> >
> > On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith <
> ryan.justin.sm...@gmail.com
> > >wrote:
> >
> > > Hi Todd.
> > >
> > > Thanks, that was it.  Now I got an error during the configure script.
> >  Any
> > > ideas?
> > >
> > >  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity
> check
> > >
> > > Please see full details below.
> > >
> > > http://pastebin.com/m79ff6ef7
> > >
> > > Also, is there a way to avoid building pipes? I just need libhdfs.
> >  Thanks.
> > > -Ryan
> > >
> > >
> > >
> > > On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon 
> wrote:
> > >
> > > > Hi Ryan,
> > > >
> > > > To fix this you can simply chmod 755 that configure script referenced
> > in
> > > > the
> > > > error.
> > > >
> > > > There is a JIRA for this that I think got committed that adds another
> > > >  task to build.xml, but it may not be in 0.20.0.
> > > >
> > > > Thanks
> > > > -Todd
> > > >
> > > > On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <
> > > ryan.justin.sm...@gmail.com
> > > > >wrote:
> > > >
> > > > > I run ant clean then :
> > > > > ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> > > > >
> > > > > Below is the error output.
> > > > >
> > > > > Is this the right way to build libhdfs?
> > > > >
> > > > > 0.18.3 and 0.19.1 builds libhdfs for me just fine.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > record-parser:
> > > > >
> > > > > compile-rcc-compiler:
> > > > >
> > > > > compile-core-classes:
> > > > >[javac] Compiling 1 source file to
> > > > /home/jim/hadoop-0.20.0/build/classes
> > > > >
> > > > > compile-mapred-classes:
> > > > >[javac] Compiling 1 source file to
> > > > /home/jim/hadoop-0.20.0/build/classes
> > > > >
> > > > > compile-hdfs-classes:
> > > > >[javac] Compiling 4 source files to
> > > > > /home/jim/hadoop-0.20.0/build/classes
> > > > >
> > > > > compile-core-native:
> > > > >
> > > > > check-c++-makefiles:
> > > > >
> > > > > create-c++-pipes-makefile:
> > > > >
> > > > > BUILD FAILED
> > > > > /home/jim/hadoop-0.20.0/build.xml:1414: Execute faile

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Ryan Smith
Hello,

My problem was I didnt have g++ installed.  :)  So i installed g++ and I
re-ran :

 ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1

and now I get a strange error on utils.  Any ideas  on this one?  Looks like
its 64 bit related.
My os is ubuntu 32 bit.
$ uname -a
Linux ubuntudev 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC
2009 i686 GNU/Linux

Also, i tried building without -Dcompile.c++=1 and it didnt compile libhdfs
so it seems libhdfs needs it.  Ill be digging thru the build.xml more to
figure more out.
Thanks again.
-Ryan


-

compile-hdfs-classes:
[javac] Compiling 4 source files to
/home/rsmith/hadoop-0.20.0/build/classes

compile-core-native:

check-c++-makefiles:

create-c++-pipes-makefile:

create-c++-utils-makefile:

compile-c++-utils:
 [exec] depbase=`echo impl/StringUtils.o | sed
's|[^/]*$|.deps/&|;s|\.o$||'`; \
 [exec] if g++ -DHAVE_CONFIG_H -I.
-I/home/rsmith/hadoop-0.20.0/src/c++/utils -I./impl
-I/home/rsmith/hadoop-0.20.0/src/c++/utils/api -Wall -g -O2 -MT
impl/StringUtils.o -MD -MP -MF "$depbase.Tpo" -c -o impl/StringUtils.o
/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc; \
 [exec] then mv -f "$depbase.Tpo" "$depbase.Po"; else rm -f
"$depbase.Tpo"; exit 1; fi
 [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
function ‘uint64_t HadoopUtils::getCurrentMillis()’:
 [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:74:
error: ‘strerror’ was not declared in this scope
 [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
function ‘std::string HadoopUtils::quoteString(const std::string&, const
char*)’:
 [exec]
/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:103: error:
‘strchr’ was not declared in this scope
 [exec] /home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc: In
function ‘std::string HadoopUtils::unquoteString(const std::string&)’:
 [exec]
/home/rsmith/hadoop-0.20.0/src/c++/utils/impl/StringUtils.cc:144: error:
‘strtol’ was not declared in this scope
 [exec] make: *** [impl/StringUtils.o] Error 1

BUILD FAILED
/home/rsmith/hadoop-0.20.0/build.xml:1405: exec returned: 2

Total time: 5 seconds

--







On Tue, Jul 14, 2009 at 3:32 PM, Todd Lipcon  wrote:

> Hi Ryan,
>
> I've never seen that issue. It sounds to me like your C installation is
> screwy - what OS are you running?
>
> I *think* (but am not certain) that if you leave out -Dcompile.c++=1 and
> just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.
>
> -Todd
>
> On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith  >wrote:
>
> > Hi Todd.
> >
> > Thanks, that was it.  Now I got an error during the configure script.
>  Any
> > ideas?
> >
> >  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity check
> >
> > Please see full details below.
> >
> > http://pastebin.com/m79ff6ef7
> >
> > Also, is there a way to avoid building pipes? I just need libhdfs.
>  Thanks.
> > -Ryan
> >
> >
> >
> > On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon  wrote:
> >
> > > Hi Ryan,
> > >
> > > To fix this you can simply chmod 755 that configure script referenced
> in
> > > the
> > > error.
> > >
> > > There is a JIRA for this that I think got committed that adds another
> > >  task to build.xml, but it may not be in 0.20.0.
> > >
> > > Thanks
> > > -Todd
> > >
> > > On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <
> > ryan.justin.sm...@gmail.com
> > > >wrote:
> > >
> > > > I run ant clean then :
> > > > ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> > > >
> > > > Below is the error output.
> > > >
> > > > Is this the right way to build libhdfs?
> > > >
> > > > 0.18.3 and 0.19.1 builds libhdfs for me just fine.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > record-parser:
> > > >
> > > > compile-rcc-compiler:
> > > >
> > > > compile-core-classes:
> > > >[javac] Compiling 1 source file to
> > > /home/jim/hadoop-0.20.0/build/classes
> > > >
> > > > compile-mapred-classes:
> > > >[javac] Compiling 1 source file to
> > > /home/jim/hadoop-0.20.0/build/classes
> > > >
> > > > compile-hdfs-classes:
> > > >[javac] Compiling 4 source files to
> > > > /home/jim/hadoop-0.20.0/build/classes
> > > >
> > > > compile-core-native:
> > > >
> > > > check-c++-makefiles:
> > > >
> > > > create-c++-pipes-makefile:
> > > >
> > > > BUILD FAILED
> > > > /home/jim/hadoop-0.20.0/build.xml:1414: Execute failed:
> > > > java.io.IOException:
> > > > Cannot run program "/home/jim/hadoop-0.20.0/src/c++/pipes/configure"
> > (in
> > > > directory
> > "/home/jim/hadoop-0.20.0/build/c++-build/Linux-i386-32/pipes"):
> > > > java.io.IOException: error=13, Permission denied
> > > >
> > >
> >
>


Re: hadoop cluster startup error

2009-07-14 Thread Divij Durve
I have started the cluster and inspite of that i dont get any of the
processes when i type ps. the only thing its showing is 2 java processes on
the name node.

im trying to do the word count example but this is what is happening:

[di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input ./file01
[di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input ./file02
[di...@gobi bin]$ ./hadoop fs -ls
ls: Cannot access .: No such file or directory.
[di...@gobi bin]$ ./hadoop fs -ls /usr/divij/wordcount/input
[di...@gobi bin]$ ./hadoop dfs -cat >/usr/divij/wordcount/input/file01
-bash: /usr/divij/wordcount/input/file01: No such file or directory
[di...@gobi bin]$ ./hadoop dfs -cat /usr/divij/wordcount/input/file01
cat: File does not exist: /usr/divij/wordcount/input/file01
[di...@gobi bin]$ ./hadoop fs -get /usr/divij/wordcount/input/ ./file01
[di...@gobi bin]$ ./hadoop fs -cat /usr/divij/wordcount/input/file01
cat: File does not exist: /usr/divij/wordcount/input/file01
[di...@gobi bin]$ ./hadoop dfs -cat /usr/divij/wordcount/input/file01
cat: File does not exist: /usr/divij/wordcount/input/file01
[di...@gobi bin]$ ./hadoop fs -test -e /usr/divij/wordcount/input/file01
[di...@gobi bin]$ ./hadoop fs -tail /usr/divij/wordcount/input/file01
tail: File does not exist: /usr/divij/wordcount/input/file01
[di...@gobi bin]$

i really dont know whats wrong. can anyone help?


On Mon, Jul 13, 2009 at 6:15 PM, Aaron Kimball  wrote:

> I'm not convinced anything is wrong with the TaskTracker. Can you run jobs?
> Does the pi example work? If not, what error does it give?
>
> If you're trying to configure your SecondaryNameNode on a different host
> than your NameNode, you'll need to do some configuration tweaking. I wrote
> a
> blog post with instructions on how to do this, here:
>
> http://www.cloudera.com/blog/2009/02/10/multi-host-secondarynamenode-configuration/
>
> Good luck!
> - Aaron
>
> On Mon, Jul 13, 2009 at 3:00 PM, Divij Durve  wrote:
>
> > here is the log file from the secondary namenode:
> >
> > 2009-07-13 17:36:13,274 INFO
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: STARTUP_MSG:
> > /
> > STARTUP_MSG: Starting SecondaryNameNode
> > STARTUP_MSG:   host = kalahari.mssm.edu/127.0.0.1
> > STARTUP_MSG:   args = []
> > STARTUP_MSG:   version = 0.19.0
> > STARTUP_MSG:   build =
> > https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.19 -r
> > 713890;
> > compiled by 'ndaley' on Fri Nov 14 03:12:29 UTC 2008
> > /
> > 2009-07-13 17:36:13,350 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> > Initializing JVM Metrics with processName=SecondaryNameNode,
> sessionId=null
> > 2009-07-13 17:36:13,445 INFO
> org.apache.hadoop.hdfs.server.common.Storage:
> > Recovering storage directory
> > /home/divij/hive/build/hadoopcore/hadoop-0.19.0/hadoo
> > p-tmp/dfs/namesecondary from failed checkpoint.
> > 2009-07-13 17:36:13,516 INFO org.mortbay.http.HttpServer: Version
> > Jetty/5.1.4
> > 2009-07-13 17:36:13,522 INFO org.mortbay.util.Credential: Checking
> Resource
> > aliases
> > 2009-07-13 17:36:13,729 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.servlet.webapplicationhand...@6602e323
> > 2009-07-13 17:36:13,772 INFO org.mortbay.util.Container: Started
> > WebApplicationContext[/static,/static]
> > 2009-07-13 17:36:13,855 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.servlet.webapplicationhand...@77546dbc
> > 2009-07-13 17:36:13,866 INFO org.mortbay.util.Container: Started
> > WebApplicationContext[/logs,/logs]
> > 2009-07-13 17:36:13,945 INFO org.mortbay.jetty.servlet.XMLConfiguration:
> No
> > WEB-INF/web.xml in
> > file:/home/divij/hive/build/hadoopcore/hadoop-0.19.0/webapps/secondary.
> > Serving files and default/dynamic servlets only
> > 2009-07-13 17:36:13,946 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.servlet.webapplicationhand...@52c00025
> > 2009-07-13 17:36:13,960 INFO org.mortbay.util.Container: Started
> > WebApplicationContext[/,/]
> > 2009-07-13 17:36:13,962 INFO org.mortbay.http.SocketListener: Started
> > SocketListener on 0.0.0.0:50090
> > 2009-07-13 17:36:13,962 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.ser...@2f74219d
> > 2009-07-13 17:36:13,962 INFO
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Secondary
> > Web-server up at: 0.0.0.0:50090
> > 2009-07-13 17:36:13,962 WARN
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Checkpoint
> > Period   :3600 secs (60 min)
> > 2009-07-13 17:36:13,963 WARN
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Log Size
> > Trigger:67108864 bytes (65536 KB)
> > 2009-07-13 17:41:14,018 ERROR
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
> > doCheckpoint:
> > 2009-07-13 17:41:14,019 ERROR
> > org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
> > java.net.ConnectExceptio

Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Todd Lipcon
Hi Ryan,

I've never seen that issue. It sounds to me like your C installation is
screwy - what OS are you running?

I *think* (but am not certain) that if you leave out -Dcompile.c++=1 and
just leave -Dlibhdfs, you'll avoid pipes but get libhdfs.

-Todd

On Tue, Jul 14, 2009 at 12:16 PM, Ryan Smith wrote:

> Hi Todd.
>
> Thanks, that was it.  Now I got an error during the configure script.  Any
> ideas?
>
>  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity check
>
> Please see full details below.
>
> http://pastebin.com/m79ff6ef7
>
> Also, is there a way to avoid building pipes? I just need libhdfs.  Thanks.
> -Ryan
>
>
>
> On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon  wrote:
>
> > Hi Ryan,
> >
> > To fix this you can simply chmod 755 that configure script referenced in
> > the
> > error.
> >
> > There is a JIRA for this that I think got committed that adds another
> >  task to build.xml, but it may not be in 0.20.0.
> >
> > Thanks
> > -Todd
> >
> > On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith <
> ryan.justin.sm...@gmail.com
> > >wrote:
> >
> > > I run ant clean then :
> > > ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> > >
> > > Below is the error output.
> > >
> > > Is this the right way to build libhdfs?
> > >
> > > 0.18.3 and 0.19.1 builds libhdfs for me just fine.
> > >
> > >
> > >
> > >
> > >
> > > record-parser:
> > >
> > > compile-rcc-compiler:
> > >
> > > compile-core-classes:
> > >[javac] Compiling 1 source file to
> > /home/jim/hadoop-0.20.0/build/classes
> > >
> > > compile-mapred-classes:
> > >[javac] Compiling 1 source file to
> > /home/jim/hadoop-0.20.0/build/classes
> > >
> > > compile-hdfs-classes:
> > >[javac] Compiling 4 source files to
> > > /home/jim/hadoop-0.20.0/build/classes
> > >
> > > compile-core-native:
> > >
> > > check-c++-makefiles:
> > >
> > > create-c++-pipes-makefile:
> > >
> > > BUILD FAILED
> > > /home/jim/hadoop-0.20.0/build.xml:1414: Execute failed:
> > > java.io.IOException:
> > > Cannot run program "/home/jim/hadoop-0.20.0/src/c++/pipes/configure"
> (in
> > > directory
> "/home/jim/hadoop-0.20.0/build/c++-build/Linux-i386-32/pipes"):
> > > java.io.IOException: error=13, Permission denied
> > >
> >
>


Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Ryan Smith
Hi Todd.

Thanks, that was it.  Now I got an error during the configure script.  Any
ideas?

  [exec] configure: error: C++ preprocessor "/lib/cpp" fails sanity check

Please see full details below.

http://pastebin.com/m79ff6ef7

Also, is there a way to avoid building pipes? I just need libhdfs.  Thanks.
-Ryan



On Tue, Jul 14, 2009 at 2:50 PM, Todd Lipcon  wrote:

> Hi Ryan,
>
> To fix this you can simply chmod 755 that configure script referenced in
> the
> error.
>
> There is a JIRA for this that I think got committed that adds another
>  task to build.xml, but it may not be in 0.20.0.
>
> Thanks
> -Todd
>
> On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith  >wrote:
>
> > I run ant clean then :
> > ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
> >
> > Below is the error output.
> >
> > Is this the right way to build libhdfs?
> >
> > 0.18.3 and 0.19.1 builds libhdfs for me just fine.
> >
> >
> >
> >
> >
> > record-parser:
> >
> > compile-rcc-compiler:
> >
> > compile-core-classes:
> >[javac] Compiling 1 source file to
> /home/jim/hadoop-0.20.0/build/classes
> >
> > compile-mapred-classes:
> >[javac] Compiling 1 source file to
> /home/jim/hadoop-0.20.0/build/classes
> >
> > compile-hdfs-classes:
> >[javac] Compiling 4 source files to
> > /home/jim/hadoop-0.20.0/build/classes
> >
> > compile-core-native:
> >
> > check-c++-makefiles:
> >
> > create-c++-pipes-makefile:
> >
> > BUILD FAILED
> > /home/jim/hadoop-0.20.0/build.xml:1414: Execute failed:
> > java.io.IOException:
> > Cannot run program "/home/jim/hadoop-0.20.0/src/c++/pipes/configure" (in
> > directory "/home/jim/hadoop-0.20.0/build/c++-build/Linux-i386-32/pipes"):
> > java.io.IOException: error=13, Permission denied
> >
>


Re: Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Todd Lipcon
Hi Ryan,

To fix this you can simply chmod 755 that configure script referenced in the
error.

There is a JIRA for this that I think got committed that adds another
 task to build.xml, but it may not be in 0.20.0.

Thanks
-Todd

On Tue, Jul 14, 2009 at 11:36 AM, Ryan Smith wrote:

> I run ant clean then :
> ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1
>
> Below is the error output.
>
> Is this the right way to build libhdfs?
>
> 0.18.3 and 0.19.1 builds libhdfs for me just fine.
>
>
>
>
>
> record-parser:
>
> compile-rcc-compiler:
>
> compile-core-classes:
>[javac] Compiling 1 source file to /home/jim/hadoop-0.20.0/build/classes
>
> compile-mapred-classes:
>[javac] Compiling 1 source file to /home/jim/hadoop-0.20.0/build/classes
>
> compile-hdfs-classes:
>[javac] Compiling 4 source files to
> /home/jim/hadoop-0.20.0/build/classes
>
> compile-core-native:
>
> check-c++-makefiles:
>
> create-c++-pipes-makefile:
>
> BUILD FAILED
> /home/jim/hadoop-0.20.0/build.xml:1414: Execute failed:
> java.io.IOException:
> Cannot run program "/home/jim/hadoop-0.20.0/src/c++/pipes/configure" (in
> directory "/home/jim/hadoop-0.20.0/build/c++-build/Linux-i386-32/pipes"):
> java.io.IOException: error=13, Permission denied
>


Re: Looking for counterpart of Configure Method

2009-07-14 Thread Ted Dunning
I don't know what you mean by that.

The guarantee is that each mapper object will have close called and that map
will never be called after close is called.

On Tue, Jul 14, 2009 at 10:47 AM, Ramakishore Yelamanchilli <
kyela...@cisco.com> wrote:

> That close method will execute for all the map tasks running right??
>
> -Original Message-
> From: Jingkei Ly [mailto:jly.l...@googlemail.com]
> Sent: Tuesday, July 14, 2009 10:29 AM
> To: common-user@hadoop.apache.org
> Subject: Re: Looking for counterpart of Configure Method
>
> Aaron answered this question: Mapper implements Closeable, so you should
> override #close() in your Mapper, and it will be called when the map task
> finishes:
>
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Mapp
> er.html
>
> On Tue, Jul 14, 2009 at 5:52 PM, akhil1988  wrote:
>
> >
> > Does anyone knows the answer to my question or is there any alternative
> to
> > do
> > this?
> >
> > Thanks,
> > Akhil
> >
> >
> >
> > akhil1988 wrote:
> > >
> > > Hi All,
> > >
> > > Just like method configure in Mapper interface, I am looking for its
> > > counterpart that will perform the closing operation for a Map task. For
> > > example, in method configure I start an external daemon which is used
> > > throughout my map task and when the map task completes, I want to close
> > > that daemon. Is there any method that gets called at the end of the Map
> > > task and  can be overwritten just like configure class.
> > >
> > > Thanks,
> > > Akhil
> > >
> >
> > --
> > View this message in context:
> >
>
> http://www.nabble.com/Looking-for-counterpart-of-Configure-Method-tp24466923
> p24483501.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
>
>


-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
http://www.deepdyve.com
858-414-0013 (m)
408-773-0220 (fax)


Compiling libhdfs in 0.20.0 release

2009-07-14 Thread Ryan Smith
I run ant clean then :
ant compile-contrib -Dlibhdfs=1 -Dcompile.c++=1

Below is the error output.

Is this the right way to build libhdfs?

0.18.3 and 0.19.1 builds libhdfs for me just fine.





record-parser:

compile-rcc-compiler:

compile-core-classes:
[javac] Compiling 1 source file to /home/jim/hadoop-0.20.0/build/classes

compile-mapred-classes:
[javac] Compiling 1 source file to /home/jim/hadoop-0.20.0/build/classes

compile-hdfs-classes:
[javac] Compiling 4 source files to
/home/jim/hadoop-0.20.0/build/classes

compile-core-native:

check-c++-makefiles:

create-c++-pipes-makefile:

BUILD FAILED
/home/jim/hadoop-0.20.0/build.xml:1414: Execute failed: java.io.IOException:
Cannot run program "/home/jim/hadoop-0.20.0/src/c++/pipes/configure" (in
directory "/home/jim/hadoop-0.20.0/build/c++-build/Linux-i386-32/pipes"):
java.io.IOException: error=13, Permission denied


RE: could only be replicated to 0 nodes, instead of 1

2009-07-14 Thread Boyu Zhang
This happened to me too. What I did was deleting the files generated by
formating the namenode. By default, they are under /tmp/hadoop, just
delete the hadoop*** directory and re-format the  namenode. If you specify
another location in your config file, go to that location and delete the
corresponding directory.

If your last job did not end correctly, you will have this kind of problems.

Hope it could help.

Boyu Zhang

Ph. D. Student
Computer and Information Sciences Department
University of Delaware

(210) 274-2104
bzh...@udel.edu
http://www.eecis.udel.edu/~bzhang

-Original Message-
From: Anthony.Fan [mailto:anthony...@hotmail.com] 
Sent: Monday, July 13, 2009 6:21 AM
To: hadoop-u...@lucene.apache.org
Subject: could only be replicated to 0 nodes, instead of 1


Hi, All

I just start to use Hadoop few days ago. I met the error message 
" WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/hadoop/count/count/temp1 could only be replicated to 0 nodes, instead
of 1"
while trying to copy data files to DFS after Hadoop is started.

I did all the settings according to the
"Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)"'s instruction, and I
don't know what's wrong. Besides, during the process, no error message is
written to log files.

Also, according to "http://localhost.localdomain:50070/dfshealth.jsp";, I
have one live namenode. By the broswer, I even can see the first data file
is created in DFS, but the size of it is 0.

Things I've tried:
1. Stop hadoop, re-format DFS and start hadoop again.
2. Change "localhost" to "127.0.0.1"

But neigher of them works.

Could anyone help me or give me a hint?

Thanks.

Anthony
-- 
View this message in context:
http://www.nabble.com/could-only-be-replicated-to-0-nodes%2C-instead-of-1-tp
24459104p24459104.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.




RE: Looking for counterpart of Configure Method

2009-07-14 Thread Ramakishore Yelamanchilli
That close method will execute for all the map tasks running right??

-Original Message-
From: Jingkei Ly [mailto:jly.l...@googlemail.com] 
Sent: Tuesday, July 14, 2009 10:29 AM
To: common-user@hadoop.apache.org
Subject: Re: Looking for counterpart of Configure Method

Aaron answered this question: Mapper implements Closeable, so you should
override #close() in your Mapper, and it will be called when the map task
finishes:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Mapp
er.html

On Tue, Jul 14, 2009 at 5:52 PM, akhil1988  wrote:

>
> Does anyone knows the answer to my question or is there any alternative to
> do
> this?
>
> Thanks,
> Akhil
>
>
>
> akhil1988 wrote:
> >
> > Hi All,
> >
> > Just like method configure in Mapper interface, I am looking for its
> > counterpart that will perform the closing operation for a Map task. For
> > example, in method configure I start an external daemon which is used
> > throughout my map task and when the map task completes, I want to close
> > that daemon. Is there any method that gets called at the end of the Map
> > task and  can be overwritten just like configure class.
> >
> > Thanks,
> > Akhil
> >
>
> --
> View this message in context:
>
http://www.nabble.com/Looking-for-counterpart-of-Configure-Method-tp24466923
p24483501.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>



Re: Looking for counterpart of Configure Method

2009-07-14 Thread Jingkei Ly
Aaron answered this question: Mapper implements Closeable, so you should
override #close() in your Mapper, and it will be called when the map task
finishes:
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/Mapper.html

On Tue, Jul 14, 2009 at 5:52 PM, akhil1988  wrote:

>
> Does anyone knows the answer to my question or is there any alternative to
> do
> this?
>
> Thanks,
> Akhil
>
>
>
> akhil1988 wrote:
> >
> > Hi All,
> >
> > Just like method configure in Mapper interface, I am looking for its
> > counterpart that will perform the closing operation for a Map task. For
> > example, in method configure I start an external daemon which is used
> > throughout my map task and when the map task completes, I want to close
> > that daemon. Is there any method that gets called at the end of the Map
> > task and  can be overwritten just like configure class.
> >
> > Thanks,
> > Akhil
> >
>
> --
> View this message in context:
> http://www.nabble.com/Looking-for-counterpart-of-Configure-Method-tp24466923p24483501.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


Re: Restarting a killed job from where it left

2009-07-14 Thread akhil1988

Thanks, Tom this was what I was looking for.
Just to confirm it's usage - it means that upon jobtracker restart it will
automotically discover whether it was running a job or not previously and if
it was, it will restart from where it left?

--Akhil


Tom White-3 wrote:
> 
> Hi Akhil,
> 
> Have a look at the mapred.jobtracker.restart.recover property.
> 
> Cheers,
> Tom
> 
> On Sun, Jul 12, 2009 at 12:06 AM, akhil1988 wrote:
>>
>> HI All,
>>
>> I am looking for ways to restart my hadoop job from where it left when
>> the
>> entire cluster goes down or the job gets stopped due to some reason i.e.
>> I
>> am looking for ways in which I can store at regular intervals the status
>> of
>> my job and then when I restart the job it starts from where it left
>> rather
>> than starting from the beginning again.
>>
>> Can anyone please give me some reference to read about the ways to handle
>> this.
>>
>> Thanks,
>> Akhil
>> --
>> View this message in context:
>> http://www.nabble.com/Restarting-a-killed-job-from-where-it-left-tp2618p2618.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Restarting-a-killed-job-from-where-it-left-tp2618p24483906.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Cluster configuration issue

2009-07-14 Thread Tamir Kamara
Hi,

To restart hadoop on a specific node use a command like this:
"hadoop-daemon.sh stop tasktracker" and after that the same command with
start. You can also do the same with the datanode but it doesn't look like
there's a problem there.

I had the same problem with missing slots but I don't remeber if it happened
on maps. The fix in my case was this patch:
https://issues.apache.org/jira/browse/HADOOP-5269


Tamir

On Tue, Jul 14, 2009 at 7:25 PM, KTM  wrote:

> Hi, I'm running Hadoop 0.19.1 on a cluster with 8 machines, 7 of which are
> used as slaves and the other the master, each with 2 dual-core AMD CPUs and
> generous amounts of RAM.  I am running map-only jobs and have the slaves
> set
> up to have 4 mappers each, for a total of 28 available mappers.  When I
> first start up my cluster, I am able to use all 28 mappers.  However, after
> a short bit of time (~12 hours), jobs that I submit start using fewer
> mappers.  I restarted my cluster last night, and currently only 19 mappers
> are running tasks even though more tasks are pending, with at least 2 tasks
> running per machine - so no machine has gone down.  I have checked that the
> unused cores are actually sitting idle.  Any ideas for why this is
> happening?  Is there a way to restart Hadoop on the individual slaves?
>  Thanks! Mac
>


Re: Looking for counterpart of Configure Method

2009-07-14 Thread akhil1988

Does anyone knows the answer to my question or is there any alternative to do
this?

Thanks,
Akhil



akhil1988 wrote:
> 
> Hi All,
> 
> Just like method configure in Mapper interface, I am looking for its
> counterpart that will perform the closing operation for a Map task. For
> example, in method configure I start an external daemon which is used
> throughout my map task and when the map task completes, I want to close
> that daemon. Is there any method that gets called at the end of the Map
> task and  can be overwritten just like configure class.
> 
> Thanks,
> Akhil
> 

-- 
View this message in context: 
http://www.nabble.com/Looking-for-counterpart-of-Configure-Method-tp24466923p24483501.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



Re: Why is Spilled Records always equal to Map output records

2009-07-14 Thread Owen O'Malley
There is no requirement that all of the reduces are running while the map is
running. The dataflow is that the map writes its output to local disk and
that the reduces pull the map outputs when they need them. There are threads
handling sorting and spill of the records to disk, but that doesn't remove
the need for the map to store its outputs to disk. (Of course, if there is
enough ram, the operating system will have the map outputs in its file cache
and not need to read from disk.)

It is an interesting question as to what the changes would need to be to
have the maps push to the reduces, but they would be substantial.

-- Owen


Cluster configuration issue

2009-07-14 Thread KTM
Hi, I'm running Hadoop 0.19.1 on a cluster with 8 machines, 7 of which are
used as slaves and the other the master, each with 2 dual-core AMD CPUs and
generous amounts of RAM.  I am running map-only jobs and have the slaves set
up to have 4 mappers each, for a total of 28 available mappers.  When I
first start up my cluster, I am able to use all 28 mappers.  However, after
a short bit of time (~12 hours), jobs that I submit start using fewer
mappers.  I restarted my cluster last night, and currently only 19 mappers
are running tasks even though more tasks are pending, with at least 2 tasks
running per machine - so no machine has gone down.  I have checked that the
unused cores are actually sitting idle.  Any ideas for why this is
happening?  Is there a way to restart Hadoop on the individual slaves?
 Thanks! Mac


Re: more than one reducer in standalone mode

2009-07-14 Thread jason hadoop
Thanks Tom.
The single reducer is greatly limiting in local mode.

On Tue, Jul 14, 2009 at 3:15 AM, Tom White  wrote:

> There's a Jira to fix this here:
> https://issues.apache.org/jira/browse/MAPREDUCE-434
>
> Tom
>
> On Mon, Jul 13, 2009 at 12:34 AM, jason hadoop
> wrote:
> > If the jobtracker is set to local, there is no way to have more than 1
> > reducer.
> >
> > On Sun, Jul 12, 2009 at 12:21 PM, Rares Vernica 
> wrote:
> >
> >> Hello,
> >>
> >> Is it possible to have more than one reducer in standalone mode? I am
> >> currently using 0.17.2.1 and I do:
> >>
> >> job.setNumReduceTasks(4);
> >>
> >> before starting the job and it seems that Hadoop overrides the
> >> variable, as it says:
> >>
> >> 09/07/12 12:07:40 INFO mapred.MapTask: numReduceTasks: 1
> >>
> >> Thanks!
> >> Rares
> >>
> >
> >
> >
> > --
> > Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> > http://www.amazon.com/dp/1430219424?tag=jewlerymall
> > www.prohadoopbook.com a community for Hadoop Professionals
> >
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: Relation between number of map reduce tasks per node and capacity of namenode

2009-07-14 Thread jason hadoop
The namenode is pretty much driven by the number of blocks and the number of
files in your HDFS, and to a lessor extent, the rate of
create/open/write/close of files.
If you have any instability in your datanodes, there is a great increase in
namenode loading.

On Tue, Jul 14, 2009 at 4:16 AM, Hrishikesh Agashe <
hrishikesh_aga...@persistent.co.in> wrote:

> Hi,
>
> Is there any relationship between how many map and how many reduce tasks I
> am running per node and what is capacity (RAM, CPU) of my NameNode?
> i.e. if I want to run more maps and more reduce tasks per node then RAM of
> NameNode should be high?
> Similarly does NameNode capacity should be driven by how many number of
> machines are running map reduce tasks?
>
> Please let me know.
>
> --Hrishi
>
> DISCLAIMER
> ==
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals


Re: access Configuration object in Partioner??

2009-07-14 Thread Tom White
Hi Jianmin,

Sorry - I (incorrectly) assumed you were using the old API.
Partitioners don't yet work with the new API (see
https://issues.apache.org/jira/browse/MAPREDUCE-565). However, when
they do you can make your Partitioner implement Configurable (by
extending Configured, for example), and this will give you access to
the job configuration, since the framework will set it for you on the
partitioner.

Cheers
Tom

On Tue, Jul 14, 2009 at 12:46 PM, Jianmin Woo wrote:
> Thanks a lot for your information, Tom.
>
> I am using the org.apache.hadoop.mapreduce.Partitioner in 0.20. It seems that 
> the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in 
> the futture.
> Do you have some suggestions on this?
>
> Thanks,
> Jianmin
>
>
>
>
> 
> From: Tom White 
> To: common-user@hadoop.apache.org
> Sent: Tuesday, July 14, 2009 6:03:34 PM
> Subject: Re: access Configuration object in Partioner??
>
> Hi Jianmin,
>
> Partitioner extends JobConfigurable, so you can implement the
> configure() method to access the JobConf.
>
> Hope that helps.
>
> Cheers,
> Tom
>
> On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote:
>> Hi,
>>
>> I am considering to implement a Partitioner that needs to access the 
>> parameters in Configuration of job. However, there is no straightforward way 
>> for this task. Are there any suggestions?
>>
>> Thanks,
>> Jianmin
>>
>>
>>
>>
>
>
>
>


Re: access Configuration object in Partioner??

2009-07-14 Thread Jianmin Woo
Thanks a lot for your information, Tom. 

I am using the org.apache.hadoop.mapreduce.Partitioner in 0.20. It seems that 
the org.apache.hadoop.mapred.Partitioner is deprecated and will be removed in 
the futture.
Do you have some suggestions on this?

Thanks,
Jianmin





From: Tom White 
To: common-user@hadoop.apache.org
Sent: Tuesday, July 14, 2009 6:03:34 PM
Subject: Re: access Configuration object in Partioner??

Hi Jianmin,

Partitioner extends JobConfigurable, so you can implement the
configure() method to access the JobConf.

Hope that helps.

Cheers,
Tom

On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote:
> Hi,
>
> I am considering to implement a Partitioner that needs to access the 
> parameters in Configuration of job. However, there is no straightforward way 
> for this task. Are there any suggestions?
>
> Thanks,
> Jianmin
>
>
>
>



  

Relation between number of map reduce tasks per node and capacity of namenode

2009-07-14 Thread Hrishikesh Agashe
Hi,

Is there any relationship between how many map and how many reduce tasks I am 
running per node and what is capacity (RAM, CPU) of my NameNode?
i.e. if I want to run more maps and more reduce tasks per node then RAM of 
NameNode should be high?
Similarly does NameNode capacity should be driven by how many number of 
machines are running map reduce tasks?

Please let me know.

--Hrishi

DISCLAIMER
==
This e-mail may contain privileged and confidential information which is the 
property of Persistent Systems Ltd. It is intended only for the use of the 
individual or entity to which it is addressed. If you are not the intended 
recipient, you are not authorized to read, retain, copy, print, distribute or 
use this message. If you have received this communication in error, please 
notify the sender and delete all copies of this message. Persistent Systems 
Ltd. does not accept any liability for virus infected mails.


Re: more than one reducer in standalone mode

2009-07-14 Thread Tom White
There's a Jira to fix this here:
https://issues.apache.org/jira/browse/MAPREDUCE-434

Tom

On Mon, Jul 13, 2009 at 12:34 AM, jason hadoop wrote:
> If the jobtracker is set to local, there is no way to have more than 1
> reducer.
>
> On Sun, Jul 12, 2009 at 12:21 PM, Rares Vernica  wrote:
>
>> Hello,
>>
>> Is it possible to have more than one reducer in standalone mode? I am
>> currently using 0.17.2.1 and I do:
>>
>> job.setNumReduceTasks(4);
>>
>> before starting the job and it seems that Hadoop overrides the
>> variable, as it says:
>>
>> 09/07/12 12:07:40 INFO mapred.MapTask: numReduceTasks: 1
>>
>> Thanks!
>> Rares
>>
>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>


Re: access Configuration object in Partioner??

2009-07-14 Thread Tom White
Hi Jianmin,

Partitioner extends JobConfigurable, so you can implement the
configure() method to access the JobConf.

Hope that helps.

Cheers,
Tom

On Tue, Jul 14, 2009 at 10:27 AM, Jianmin Woo wrote:
> Hi,
>
> I am considering to implement a Partitioner that needs to access the 
> parameters in Configuration of job. However, there is no straightforward way 
> for this task. Are there any suggestions?
>
> Thanks,
> Jianmin
>
>
>
>


access Configuration object in Partioner??

2009-07-14 Thread Jianmin Woo
Hi, 

I am considering to implement a Partitioner that needs to access the parameters 
in Configuration of job. However, there is no straightforward way for this 
task. Are there any suggestions?

Thanks,
Jianmin



  

Re: MultithreadedMapper in 0.20.0

2009-07-14 Thread Mathias De Maré
On Sat, Jul 11, 2009 at 9:45 PM, Marshall Schor  wrote:

> Just a wild guess: Since there are 2 Mapper classes in hadoop 20, each
> in a different package, did you import the right package when extending
> CrawlerMapper?


Yes, I noticed the same thing when using a plain Job and any other class
extending Mapper.

On Tue, Jul 14, 2009 at 7:06 AM, Amareshwari Sriramadasu <
amar...@yahoo-inc.com> wrote:

> MultithreadedMapper in 0.20.0 has got bug. It got fixed in 0.21 by
> http://issues.apache.org/jira/browse/MAPREDUCE-465
>
> Thanks
> Amareshwari


Okay, that explains it then. Thank you very much. Is there a release
schedule for 0.21? Or a preliminary date for the release?

Mathias