from:"Konstantin Boudnik"

Re: Hardware inquiry

2010-02-05 Thread Konstantin Boudnik

Oh you might consider a colo which cost you 1/4 of EC2 :)

On Fri, Feb 05, 2010 at 10:59AM, Sirota, Peter wrote:
> Hi Justin,
> 
> Have you guys considered running inside Amazon Elastic MapReduce?  With this 
> service you don't have to choose your hardware across all jobs but rather pic 
> out of 7 hardware types we have available.  Also you don't have to pay 
> capital upfront but rather scale with your needs.
> 
> Let me know if we can help you to get started with Amazon Elastic MapReduce.  
>  http://aws.amazon.com/elasticmapreduce/ 
> 
> 
> 
> 
> Regards,
> Peter Sirota
> GM, Amazon Elastic MapReduce
> 
> -Original Message-
> From: Justin Becker [mailto:becker.jus...@gmail.com] 
> Sent: Wednesday, February 03, 2010 5:15 PM
> To: common-user@hadoop.apache.org
> Subject: Hardware inquiry
> 
> My organization has decided to make a substantial investment in hardware for
> processing Hadoop jobs.  Our cluster will be used by multiple groups so its
> hard to classify the problems as IO, memory, or CPU bound.  Would others be
> willing to share their hardware profiles coupled with the problem types
> (memory, cpu, etc.).  Our current setup, for the existing cluster is made up
> of the following machines,
> 
> Poweredge 1655
> 2x2 Intel Xeon 1.4ghz
> 2GB RAM
> 72GB local HD
> 
> Poweredge 1855
> 2x2 Intel Xeon 3.2ghz
> 8GB RAM
> 146GB local HD
> 
> Poweredge 1955
> 2x2 Intel Xeon 3.0ghz
> 4GB RAM
> 72GB local HD
> 
> Obviously, we would like to increase local disk space, memory, and the
> number of cores.  The not-so-obvious decision is wether to select high end
> equipment (fewer machines) or lower-class hardware.  We're trying to balance
> "how commodity" against the administration costs.  I've read the machine
> scaling material on the Hadoop wiki.  Any additional real-world advice would
> be awesome.
> 
> 
> Thanks,
> 
> Justin

Re: problem building trunk

2010-02-26 Thread Konstantin Boudnik

In order to post any artifacts to the central repository one needs to have
special access rights. Those are available on to build-masters basically.

On Fri, Feb 26, 2010 at 11:01AM, Massoud Mazar wrote:
> Thanks Owen,
> 
> Now hadoop-common builds, but if I use the same target (mvn-install) when 
> building hdfs, I get this error:
> 
> ant clean jar jar-test mvn-install
> 
> ivy-resolve-common:
> [ivy:resolve]
> [ivy:resolve] :: problems summary ::
> [ivy:resolve]  WARNINGS
> [ivy:resolve]   module not found: 
> org.apache.hadoop#hadoop-core;0.21.0-alpha-15
> [ivy:resolve]    apache-snapshot: tried
> [ivy:resolve] 
> https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-core/0.21.0-alpha-15/hadoop-core-0.21.0-alpha-15.pom
> [ivy:resolve] -- artifact 
> org.apache.hadoop#hadoop-core;0.21.0-alpha-15!hadoop-core.jar:
> [ivy:resolve] 
> https://repository.apache.org/content/repositories/snapshots/org/apache/hadoop/hadoop-core/0.21.0-alpha-15/hadoop-core-0.21.0-alpha-15.jar
> [ivy:resolve]    maven2: tried
> [ivy:resolve] 
> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0-alpha-15/hadoop-core-0.21.0-alpha-15.pom
> [ivy:resolve] -- artifact 
> org.apache.hadoop#hadoop-core;0.21.0-alpha-15!hadoop-core.jar:
> [ivy:resolve] 
> http://repo1.maven.org/maven2/org/apache/hadoop/hadoop-core/0.21.0-alpha-15/hadoop-core-0.21.0-alpha-15.jar
> [ivy:resolve]   ::
> [ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
> [ivy:resolve]   ::
> [ivy:resolve]   :: org.apache.hadoop#hadoop-core;0.21.0-alpha-15: not 
> found
> [ivy:resolve]   ::
> [ivy:resolve]
> [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
> 
> BUILD FAILED
> /root/Hadoop/hadoop-hdfs/build.xml:1362: impossible to resolve dependencies:
> resolve failed - see output for details
> 
> 
> 
> -Original Message-
> From: Owen O'Malley [mailto:omal...@apache.org] 
> Sent: Friday, February 26, 2010 1:44 PM
> To: Giridharan Kesavan
> Cc: common-user
> Subject: Re: problem building trunk
> 
> 
> On Feb 26, 2010, at 10:22 AM, Massoud Mazar wrote:
> 
> > I'm having issues building the trunk. I follow steps mentioned at 
> > http://wiki.apache.org/hadoop/BuildingHadoopFromSVN
> 
> 
> It is a documentation error. Giri, can you update it with the current  
> targets (ie. mvn-install)?
> 
> Thanks,
> Owen

Re: Any possible to set hdfs block size to a value smaller than 64MB?

2010-05-18 Thread Konstantin Boudnik

I had an experiment with block size of 10 bytes (sic!). This was _very_ slow
on NN side. Like writing 5 Mb was happening for 25 minutes or so :( No fun to
say the least...

On Tue, May 18, 2010 at 10:56AM, Konstantin Shvachko wrote:
> You can also get some performance numbers and answers to the block size 
> dilemma problem here:
> 
> http://developer.yahoo.net/blogs/hadoop/2010/05/scalability_of_the_hadoop_dist.html
> 
> I remember some people were using Hadoop for storing or streaming videos.
> Don't know how well that worked.
> It would be interesting to learn about your experience.
> 
> Thanks,
> --Konstantin
> 
> 
> On 5/18/2010 8:41 AM, Brian Bockelman wrote:
> > Hey Hassan,
> >
> > 1) The overhead is pretty small, measured in a small number of milliseconds 
> > on average
> > 2) HDFS is not designed for "online latency".  Even though the average is 
> > small, if something "bad happens", your clients might experience a lot of 
> > delays while going through the retry stack.  The initial design was for 
> > batch processing, and latency-sensitive applications came later.
> >
> > Additionally since the NN is a SPOF, you might want to consider your uptime 
> > requirements.  Each organization will have to balance these risks with the 
> > advantages (such as much cheaper hardware).
> >
> > There's a nice interview with the GFS authors here where they touch upon 
> > the latency issues:
> >
> > http://queue.acm.org/detail.cfm?id=1594206
> >
> > As GFS and HDFS share many design features, the theoretical parts of their 
> > discussion might be useful for you.
> >
> > As far as overall throughput of the system goes, it depends heavily upon 
> > your implementation and hardware.  Our HDFS routinely serves 5-10 Gbps.
> >
> > Brian
> >
> > On May 18, 2010, at 10:29 AM, Nyamul Hassan wrote:
> >
> >> This is a very interesting thread to us, as we are thinking about deploying
> >> HDFS as a massive online storage for a on online university, and then
> >> serving the video files to students who want to view them.
> >>
> >> We cannot control the size of the videos (and some class work files), as
> >> they will mostly be uploaded by the teachers providing the classes.
> >>
> >> How would the overall through put of HDFS be affected in such a solution?
> >> Would HDFS be feasible at all for such a setup?
> >>
> >> Regards
> >> HASSAN
> >>
> >>
> >>
> >> On Tue, May 18, 2010 at 21:11, He Chen  wrote:
> >>
> >>> If you know how to use AspectJ to do aspect oriented programming. You can
> >>> write a aspect class. Let it just monitors the whole process of MapReduce
> >>>
> >>> On Tue, May 18, 2010 at 10:00 AM, Patrick Angeles  wrote:
> >>>
>  Should be evident in the total job running time... that's the only metric
>  that really matters :)
> 
>  On Tue, May 18, 2010 at 10:39 AM, Pierre ANCELOT > wrote:
> 
> > Thank you,
> > Any way I can measure the startup overhead in terms of time?
> >
> >
> > On Tue, May 18, 2010 at 4:27 PM, Patrick Angeles >> wrote:
> >
> >> Pierre,
> >>
> >> Adding to what Brian has said (some things are not explicitly
> >>> mentioned
> > in
> >> the HDFS design doc)...
> >>
> >> - If you have small files that take up<  64MB you do not actually use
>  the
> >> entire 64MB block on disk.
> >> - You *do* use up RAM on the NameNode, as each block represents
>  meta-data
> >> that needs to be maintained in-memory in the NameNode.
> >> - Hadoop won't perform optimally with very small block sizes. Hadoop
>  I/O
> > is
> >> optimized for high sustained throughput per single file/block. There
> >>> is
>  a
> >> penalty for doing too many seeks to get to the beginning of each
> >>> block.
> >> Additionally, you will have a MapReduce task per small file. Each
> > MapReduce
> >> task has a non-trivial startup overhead.
> >> - The recommendation is to consolidate your small files into large
>  files.
> >> One way to do this is via SequenceFiles... put the filename in the
> >> SequenceFile key field, and the file's bytes in the SequenceFile
> >>> value
> >> field.
> >>
> >> In addition to the HDFS design docs, I recommend reading this blog
>  post:
> >> http://www.cloudera.com/blog/2009/02/the-small-files-problem/
> >>
> >> Happy Hadooping,
> >>
> >> - Patrick
> >>
> >> On Tue, May 18, 2010 at 9:11 AM, Pierre ANCELOT 
> >> wrote:
> >>
> >>> Okay, thank you :)
> >>>
> >>>
> >>> On Tue, May 18, 2010 at 2:48 PM, Brian Bockelman<
>  bbock...@cse.unl.edu
>  wrote:
> >>>
> 
>  On May 18, 2010, at 7:38 AM, Pierre ANCELOT wrote:
> 
> > Hi, thanks for this fast answer :)
> > If so, what do you mean by blocks? If a file has to be
> >>> splitted,
>  it
> >>> will
>  be
> > splitted when larger than 64MB?
> >
>

Re: Hbase with Hadoop

2011-10-11 Thread Konstantin Boudnik

Matt,

I'd like to re-enforce the inquiry about posting (or blogging perhaps ;) details
about Hbase/0.20.205 coexistence. I am sure lotta people will benefit from
this.

Thanks in advance,
  Cos

On Tue, Oct 11, 2011 at 08:29PM, jigneshmpatel wrote:
> Matt,
> Thanks a lot. Just wanted to have some more information. If hadoop 0.2.205.0
> voted by the community members then will it become major release? And what
> if it is not approved by community members.
> 
> And as you said I do like to use 0.90.3 if it works. If it is ok, can you
> share the deails of those configuration changes?
> 
> -Jignesh
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Hbase-with-Hadoop-tp3413950p3414658.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Unrecognized option: -jvm

2011-10-15 Thread Konstantin Boudnik

You masta been using some awkward version of Hadoop...

The issues has been fixed a number of times (see HDFS-1943 for example).

Cos

On Sun, Oct 16, 2011 at 12:21AM, Majid Azimi wrote:
> Hi guys,
> 
> I'm realy new to hadoop. I have configured a single node hadoop cluster. but
> seems that my data node is not working. job tracker log file shows this
> message(alot of them per 10 second):
> 
> 2011-10-16 00:01:15,558 WARN org.apache.hadoop.mapred.JobTracker:
> Retrying...
> 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer
> Exception: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0
> nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
> 
> at org.apache.hadoop.ipc.Client.call(Client.java:1030)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
> at $Proxy5.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> at $Proxy5.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3104)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2975)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2255)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2446)
> 
> 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: Error
> Recovery for block null bad datanode[0] nodes == null
> 2011-10-16 00:01:15,589 WARN org.apache.hadoop.hdfs.DFSClient: Could not get
> block locations. Source file "/tmp/hadoop-root/mapred/system/jobtracker.info"
> - Aborting...
> 2011-10-16 00:01:15,590 WARN org.apache.hadoop.mapred.JobTracker: Writing to
> file hdfs://localhost/tmp/hadoop-root/mapred/system/jobtracker.info failed!
> 2011-10-16 00:01:15,593 WARN org.apache.hadoop.mapred.JobTracker: FileSystem
> is not ready yet!
> 2011-10-16 00:01:15,603 WARN org.apache.hadoop.mapred.JobTracker: Failed to
> initialize recovery manager.
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
> /tmp/hadoop-root/mapred/system/jobtracker.info could only be replicated to 0
> nodes, instead of 1
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1417)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:596)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:523)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1383)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1379)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:416)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1377)
> 
> at org.apache.hadoop.ipc.Client.call(Client.java:1030)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
> at $Proxy5.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

Re: Hadoop next stable release

2011-11-22 Thread Konstantin Boudnik

We are expecting to release 0.22 very shortly. 0.22 is suppose to be
considered stable because it has been heavily tested at scale by eBay team
(as far as I know). However, I will let 0.22's RM to comment on that.

Cos

On Tue, Nov 22, 2011 at 12:05PM, Niranjan Balasubramanian wrote:
> Hello
> 
> We are currently using hadoop 0.20.203 on a 10 node cluster. We are
> considering upgrading to a newer version and I have two questions in this
> regard.
> 
> 1) It seems 0.21 is unlikely to become a stable release anytime soon and we
> are weary of moving to an unstable release. Our primary concern is the data
> we have on our hdfs.  I want to know if anyone has been using 0.21 in
> production and would like to hear about your experiences? Any advice on this
> front is appreciated. 
> 
> 2) Do we know when 0.23 is likely to become stable? There has been some
> discussion on mail #dev* about 0.23 becoming stable sometime soon. Is it
> going to happen by the end of this year? 
>  
> Thanks
> ~ Niranjan.
> 
> * - 
> http://search-hadoop.com/m/f623FA7bDK1/hadoop+next+stable+release+0.21&subj=0+21+stable+schedule+

Re: choices for deploying a small hadoop cluster on EC2

2011-11-29 Thread Konstantin Boudnik

I'd suggest you use BigTop (cross-posting to bigtop-dev@ list) produced bit
which also posses Puppet recipes allowing for fully automated deployment and
configuration. BigTop also uses Jenkins EC2 plugin for deployment part and it
seems to work real great!

Cos

On Tue, Nov 29, 2011 at 12:28PM, Periya.Data wrote:
> Hi All,
> I am just beginning to learn how to deploy a small cluster (a 3
> node cluster) on EC2. After some quick Googling, I see the following
> approaches:
> 
>1. Use Whirr for quick deployment and tearing down. Uses CDH3. Does it
>have features for persisting (EBS)?
>2. CDH Cloud Scripts - has EC2 AMI - again for temp Hadoop clusters/POC
>etc. Good stuff - I can persist using EBS snapshots. But, this uses CDH2.
>3. Install hadoop manually and related stuff like Hive...on each cluster
>node...on EC2 (or use some automation tool like Chef). I do not prefer it.
>4. Hadoop distribution comes with EC2 (under src/contrib) and there are
>several Hadoop EC2 AMIs available. I have not studied enough to know if
>that is easy for a beginner like me.
>5. Anything else??
> 
> 1 and 2 look promising as a beginner. If any of you have any thoughts about
> this, I would like to know (like what to keep in mind, what to take care
> of, caveats etc). I want my data /config to persist (using EBS) and
> continue from where I left off...(after a few days).  Also, I want to have
> HIVE and SQOOP installed. Can this done using 1 or 2? Or, will installation
> of them have to be done manually after I set up the cluster?
> 
> Thanks very much,
> 
> PD.

Re: Automate Hadoop installation

2011-12-05 Thread Konstantin Boudnik

These that great project called BigTop (in the apache incubator) which
provides for building of Hadoop stack.

The part of what it provides is a set of Puppet recipes which will allow you
to do exactly what you're looking for with perhaps some minor corrections.

Serious, look at Puppet - otherwise it will be a living through nightmare of
configuration mismanagements.

Cos

On Mon, Dec 05, 2011 at 04:02PM, praveenesh kumar wrote:
> Hi all,
> 
> Can anyone guide me how to automate the hadoop installation/configuration
> process?
> I want to install hadoop on 10-20 nodes which may even exceed to 50-100
> nodes ?
> I know we can use some configuration tools like puppet/or shell-scripts ?
> Has anyone done it ?
> 
> How can we do hadoop installations on so many machines parallely ? What are
> the best practices for this ?
> 
> Thanks,
> Praveenesh

Re: HDFS Backup nodes

2011-12-13 Thread Konstantin Boudnik

On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote:
> Suresh,
> 
> As of today, there is no option except to use NFS.  And as you yourself
> mention, the first HA prototype when it comes out will require NFS.

Well, in the interest of full disclosure NFS is just one of the options and
not the only one. Any auxiliary storage will do greatly. Distributed in-memory
redundant storage for sub-seconds fail-over? Sure, Gigaspaces do this for
years using very mature JINI.

NFS is just happen to be readily available in any data center and doesn't
require much of the extra investment on top of what exists. NFS comes with its
own set of problems of course. First and foremost is No-File-Security which
requires use of something like Kerberos for third-party user management. And
when paired with something like LinuxTaskController it can produce some very
interesting effects.

Cos

> (a) I wasn't aware that Bookkeeper had progressed that far. I wonder
> whether it would be able to keep up with the data rates that is required in
> order to hold the NN log without falling behind.
> 
> (b) I do know Karthik Ranga at FB just started a design to put the NN data
> in HDFS itself, but that is in very preliminary design stages with no real
> code there.
> 
> The problem is that the HA code written with NFS in mind is very different
> from the HA code written with HDFS in mind, which are both quite different
> from the code that is written with Bookkeeper in mind. Essentially the
> three options will form three different implementations, since the failure
> modes of each of the back-ends are different. Am I totally off base?
> 
> thanks,
> Srivas.
> 
> 
> 
> 
> On Tue, Dec 13, 2011 at 11:00 AM, Suresh Srinivas 
> wrote:
> 
> > Srivas,
> >
> > As you may know already, NFS is just being used in the first prototype for
> > HA.
> >
> > Two options for editlog store are:
> > 1. Using BookKeeper. Work has already completed on trunk towards this. This
> > will replace need for NFS to  store the editlogs and is highly available.
> > This solution will also be used for HA.
> > 2. We have a short term goal also to enable editlogs going to HDFS itself.
> > The work is in progress.
> >
> > Regards,
> > Suresh
> >
> >
> > >
> > > -- Forwarded message --
> > > From: M. C. Srivas 
> > > Date: Sun, Dec 11, 2011 at 10:47 PM
> > > Subject: Re: HDFS Backup nodes
> > > To: common-user@hadoop.apache.org
> > >
> > >
> > > You are out of luck if you don't want to use NFS, and yet want redundancy
> > > for the NN.  Even the new "NN HA" work being done by the community will
> > > require NFS ... and the NFS itself needs to be HA.
> > >
> > > But if you use a Netapp, then the likelihood of the Netapp crashing is
> > > lower than the likelihood of a garbage-collection-of-death happening in
> > the
> > > NN.
> > >
> > > [ disclaimer:  I don't work for Netapp, I work for MapR ]
> > >
> > >
> > > On Wed, Dec 7, 2011 at 4:30 PM, randy  wrote:
> > >
> > > > Thanks Joey. We've had enough problems with nfs (mainly under very high
> > > > load) that we thought it might be riskier to use it for the NN.
> > > >
> > > > randy
> > > >
> > > >
> > > > On 12/07/2011 06:46 PM, Joey Echeverria wrote:
> > > >
> > > >> Hey Rand,
> > > >>
> > > >> It will mark that storage directory as failed and ignore it from then
> > > >> on. In order to do this correctly, you need a couple of options
> > > >> enabled on the NFS mount to make sure that it doesn't retry
> > > >> infinitely. I usually run with the tcp,soft,intr,timeo=10,**retrans=10
> > > >> options set.
> > > >>
> > > >> -Joey
> > > >>
> > > >> On Wed, Dec 7, 2011 at 12:37 PM,  wrote:
> > > >>
> > > >>> What happens then if the nfs server fails or isn't reachable? Does
> > hdfs
> > > >>> lock up? Does it gracefully ignore the nfs copy?
> > > >>>
> > > >>> Thanks,
> > > >>> randy
> > > >>>
> > > >>> - Original Message -
> > > >>> From: "Joey Echeverria"
> > > >>> To: common-user@hadoop.apache.org
> > > >>> Sent: Wednesday, December 7, 2011 6:07:58 AM
> > > >>> Subject: Re: HDFS Backup nodes
> > > >>>
> > > >>> You should also configure the Namenode to use an NFS mount for one of
> > > >>> it's storage directories. That will give the most up-to-date back of
> > > >>> the metadata in case of total node failure.
> > > >>>
> > > >>> -Joey
> > > >>>
> > > >>> On Wed, Dec 7, 2011 at 3:17 AM, praveenesh kumar<
> > praveen...@gmail.com>
> > > >>>  wrote:
> > > >>>
> > >  This means still we are relying on Secondary NameNode idealogy for
> > >  Namenode's backup.
> > >  Can OS-mirroring of Namenode is a good alternative keep it alive all
> > > the
> > >  time ?
> > > 
> > >  Thanks,
> > >  Praveenesh
> > > 
> > >  On Wed, Dec 7, 2011 at 1:35 PM, Uma Maheswara Rao G<
> > >  mahesw...@huawei.com>wrote:
> > > 
> > >   AFAIK backup node introduced in 0.21 version onwards.
> > > > __**__
> > > > From: praveenesh kumar [pra

Re: HDFS Backup nodes

2011-12-14 Thread Konstantin Boudnik

On Wed, Dec 14, 2011 at 10:09AM, Scott Carey wrote:
> 
> On 12/13/11 11:28 PM, "Konstantin Boudnik"  wrote:
> 
> >On Tue, Dec 13, 2011 at 11:00PM, M. C. Srivas wrote:
> >> Suresh,
> >> 
> >> As of today, there is no option except to use NFS.  And as you yourself
> >> mention, the first HA prototype when it comes out will require NFS.
> >
> >
> >NFS is just happen to be readily available in any data center and doesn't
> >require much of the extra investment on top of what exists.
> 
> That is a false assumption.  I'm not buying a netapp filer just for this.
>  We have no NFS, or want any.  If we ever use it, it won't be in the data
> center with Hadoop!

It isn't a false assumption, it is a reasonable one based on the experience.
You don't need netapp for NFS, you can have a Thumper or whatever. I am not
saying NFS is the only and the best - all I said it is pretty common ;) I
would opt fo BK or Jini Spaces like solution any day, though.

Cos

Re: Fastest HDFS loader

2011-12-20 Thread Konstantin Boudnik

Do you have some strict performance requirement or something? Cause 5Gb is
pretty much nothing, really. I'd say copyFromLocal will do just fine.

Cos

On Tue, Dec 20, 2011 at 10:32PM, Edmon Begoli wrote:
> Hi,
> 
> We are going to be loading 4-5 GB text, delimited file from a RHEL file
> system into HDFS to be managed
> as external table by Hive.
> 
> What is the recommended, fastest loading mechanism?
> 
> Thank you,
> Edmon

signature.asc
Description: Digital signature

Re: Some question about fault Injection

2011-12-29 Thread Konstantin Boudnik

I suggest to start with fault injection tests. They can found under 
  src/test/aop/org/apache/hadoop
for HDFS in 0.22. Hdfs has been has the best coverage by fault injection.

Test exists in the similar location in the trunk, but they aren't hooked up to
maven build system yet.

Cos

On Thu, Dec 29, 2011 at 03:04AM, sangroya wrote:
> Hi,
> 
> Is there any good documentation to start with fault injection. Please share
> if there is any link to any examples that demonstrate the use of fault
> injection.
> 
> 
> Thanks,
> Amit
> 
> 
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Re-Some-question-about-fault-Injection-tp2555954p3618633.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Re: Professional Hiring: Architect and Developer in Hadoop Area ( Beijing, China )

2012-04-09 Thread Konstantin Boudnik

TLDR :/

Besides, it isn't a job list

Cos

On Mon, Apr 09, 2012 at 10:59PM, Bing Li wrote:
> 国际著名大型IT企业（排名前3位）开发中心招聘Hadoop技术专家（北京）-非猎头
> 
> 职位描述：
> Hadoop系统和平台开发（架构师，资深开发人员）
> 
> 
> 职位要求：
> 
> 1.有设计开发大型分布式系统的经验（工作年限3年以上，架构师5年以上），hadoop大型实际应用经验优先
> 
> 2.良好的编程和调试经验(java or c++/c)，扎实的计算机理论基础，快速的学习能力
> 3. 沟通和合作能力强，熟练使用英语（包括口语）
> 
> *我们将提供有竞争力的待遇，欢迎加入我们*
> 
> 有意请发简历到邮箱： sarah.lib...@gmail.com

Re: Permission request from "Translation for education"

2012-06-19 Thread Konstantin Boudnik

Hi Vseslava.

This part of the ASF FAQ explains everything in this regard I think

https://www.apache.org/foundation/license-faq.html#Translation

In other words "Sure!" ;)

Cos

On Tue, Jun 19, 2012 at 04:34AM, vseslava.kavch...@gmail.com wrote:
> Hey there,
> 
> I am a student at the Department of Foreign Languages and at the
> same time a volunteer at an organization named “Translation for
> Education”. I love surfing on the Internet and being informed about
> the latest happenings around me. Unfortunately, most of my fellow
> citizens don’t know English, so that makes them feel somehow
> excluded from all this variety of useful info from the Internet.
> That was the reason that made me decide to start being a volunteer
> at such an NGO. So I created a blog, where I post translations of
> some of the texts that really caught my attention. Sometimes it’s a
> simple description of some organization, and sometimes it might be a
> narrowly specialized scientific article.
> Don’t you think that I do that without authors’ permission! I do ask
> them if they let me translate and then post the translation on my
> blog. And if they don’t, then they don’t.
> So I am asking you about the same, actually. Could you provide me
> the permission to translate the article on page
> http://hadoop.apache.org/common/docs/stable/hdfs_user_guide.html
> into , on condition that the translation is absolutely
> non-commercial and I will mention and credit you as the author of
> the article, and put the link to your source next to my translation
> too.
> 
> Hoping for your understanding and waiting for your answer!
> 
> Cheers,
> 
>

Re: Hadoop Installation on Ubuntu

2012-07-03 Thread Konstantin Boudnik

There's also BigTop distrubution that includes Hadoop
  https://incubator.apache.org/bigtop/

Also, it seems that 1.0.3 based stack in the binary form is available for
download from this site

http://www.magnatempusgroup.net/ftphost/releases/latest/ubuntu/

Cos

On Tue, Jul 03, 2012 at 10:03PM, Ying Huang wrote:
> Hello,
> I have downloaded hadoop_1.0.3-1_x86_64.deb from hadoop official
> website, and installed using command under root privileged.
> dpkg -i hadoop_1.0.3-1_x86_64.deb
> But there is an error: chown: invalid group: `root:hadoop'.
> And later, when I running hadoop-setup-single-node.sh, I got
> lots of chown errors:
> 
> Proceed with setup? (y/n) y
> chown: invalid user: `mapred:hadoop'
> chown: invalid user: `hdfs:hadoop'
> chown: invalid user: `mapred:hadoop'
> chown: invalid user: `hdfs:hadoop'
> chown: invalid group: `root:hadoop'
> chown: invalid user: `hdfs:hadoop'
> chown: invalid user: `mapred:hadoop'
> chown: invalid group: `root:hadoop'
> chown: invalid group: `root:hadoop'
> chown: invalid group: `root:hadoop'
> My java version is
> 
> java version "1.6.0_24"
> OpenJDK Runtime Environment (IcedTea6 1.11.1) (6b24-1.11.1-4ubuntu3)
> OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)
> 
> Should I manually create the group "hadoop", if so, with what
> command or right mode, 444 or 777?
> And is there detail document about how to install and setup
> using deb file?
> 
> 
> -- Best Regards
> Ying Huang
> 
> 


signature.asc
Description: Digital signature

Re: Hadoop automated tests

2013-10-16 Thread Konstantin Boudnik

[Cc bigtop-dev@]

We have stack tests as a part of Bigtop project. We don't do fault injection 
tests
like you describe just yet, but that be a great contribution to the project.

Cos

On Wed, Oct 16, 2013 at 02:12PM, hdev ml wrote:
> Hi all,
> 
> Are there automated tests available for testing sanity of hadoop layer and
> also for negative tests i.e. One Data node going down, HBase Region Server
> going down, Namenode, Jobtracker etc.
> 
> By Hadoop Layer I am asking  about Hadoop, MapReduce, HBase, Zookeeper.
> 
> What does hadoop dev team use for this? Any pointers, documentation
> articles would help a lot.
> 
> Thanks
> Harshad

Re: where are the old hadoop documentations for v0.22.0 and below ?

2014-07-28 Thread Konstantin Boudnik

I think your best bet might be to check out a particular release tag for 0.22
release and checking the docs out there. Perhaps you might want to run 'ant
docs' of whatever the target used to be back then.

Cos

On Mon, Jul 28, 2014 at 04:06PM, Jane Wayne wrote:
> where can i get the old hadoop documentation (e.g. cluster setup, xml
> configuration params) for hadoop v0.22.0 and below? i downloaded the source
> and binary files but could not find the documentations as a part of the
> archive file.
> 
> on the home page at http://hadoop.apache.org/, i only see documentations
> for the following versions.
> - current, stable, 1.2.1, 2.2.0, 2.4.1, 0.23.11

Re: Why hadoop is written in java?

2010-10-11 Thread Konstantin Boudnik

To second your point ;-) Reminds me of times when Sun Micro bought GridEngine
(C-app). Me and a couple other folks were developing Distributed Task execution
Framework (written in Java on top of JINI). 

Every time new version of eh... Windows was coming around the corner Grid
people were screaming. Guess how easy it was for us ;)

Cos

On Sat, Oct 09, 2010 at 11:07PM, Arvind Kalyan wrote:
> On Sat, Oct 9, 2010 at 9:40 PM, elton sky  wrote:
> 
> > I always have this question but couldn't find proper answer for this. For
> > system level applications, c/c++ is preferable. But why this one using
> > java?
> >
> 
> 
> Look at the system (software) requirements for running Hadoop:
> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
> 
> Imagine how it would be, if it were to be written in C/C++.
> 
> While C/C++ might give you a performance improvement at run-time, it can be
> a total nightmare to develop and maintain. Especially if the network gets to
> be heterogeneous.
> 
> 
> 
> -- 
> Arvind Kalyan
> http://www.linkedin.com/in/base16
> h: (408) 331-7921 m: (541) 971-9225

Re: load a serialized object in hadoop

2010-10-13 Thread Konstantin Boudnik

You should have no space here "-D HADOOP_CLIENT_OPTS"

On Wed, Oct 13, 2010 at 04:21PM, Shi Yu wrote:
> Hi,  thanks for the advice. I tried with your settings,
> $ bin/hadoop jar Test.jar OOloadtest -D HADOOP_CLIENT_OPTS=-Xmx4000m
> 
> still no effect. Or this is a system variable? Should I export it?
> How to configure it?
> 
> Shi
> 
>  java -Xms3G -Xmx3G -classpath 
> .:WordCount.jar:hadoop-0.19.2-core.jar:lib/log4j-1.2.15.jar:lib/commons-collections-3.2.1.jar:lib/stanford-postagger-2010-05-26.jar
> OOloadtest
> 
> 
> On 2010-10-13 15:28, Luke Lu wrote:
> >On Wed, Oct 13, 2010 at 12:27 PM, Shi Yu  wrote:
> >>I haven't implemented anything in map/reduce yet for this issue. I just try
> >>to invoke the same java class using   bin/hadoop  command.  The thing is a
> >>very simple program could be executed in Java, but not doable in bin/hadoop
> >>command.
> >If you are just trying to use bin/hadoop jar your.jar command, your
> >code runs in a local client jvm and mapred.child.java.opts has no
> >effect. You should run it with HADOOP_CLIENT_OPTS=-Xmx1000m bin/hadoop
> >jar your.jar
> >
> >>I think if I couldn't get through the first stage, even I had a
> >>map/reduce program it would also fail. I am using Hadoop 0.19.2. Thanks.
> >>
> >>Best Regards,
> >>
> >>Shi
> >>
> >>On 2010-10-13 14:15, Luke Lu wrote:
> >>>Can you post your mapper/reducer implementation? or are you using
> >>>hadoop streaming? for which mapred.child.java.opts doesn't apply to
> >>>the jvm you care about. BTW, what's the hadoop version you're using?
> >>>
> >>>On Wed, Oct 13, 2010 at 11:45 AM, Shi Yuwrote:
> >>>
> Here is my code. There is no Map/Reduce in it. I could run this code
> using
> java -Xmx1000m ,  however, when using  bin/hadoop  -D
> mapred.child.java.opts=-Xmx3000M   it has heap space not enough error.  I
> have tried other program in Hadoop with the same settings so the memory
> is
> available in my machines.
> 
> 
> public static void main(String[] args) {
>    try{
>  String myFile = "xxx.dat";
>  FileInputStream fin = new FileInputStream(myFile);
>  ois = new ObjectInputStream(fin);
>  margintagMap = ois.readObject();
>  ois.close();
>  fin.close();
>  }catch(Exception e){
>  //
> }
> }
> 
> On 2010-10-13 13:30, Luke Lu wrote:
> 
> >On Wed, Oct 13, 2010 at 8:04 AM, Shi Yu  wrote:
> >
> >
> >>As a coming-up to the my own question, I think to invoke the JVM in
> >>Hadoop
> >>requires much more memory than an ordinary JVM.
> >>
> >>
> >That's simply not true. The default mapreduce task Xmx is 200M, which
> >is much smaller than the standard jvm default 512M and most users
> >don't need to increase it. Please post the code reading the object (in
> >hdfs?) in your tasks.
> >
> >
> >
> >>I found that instead of
> >>serialization the object, maybe I could create a MapFile as an index to
> >>permit lookups by key in Hadoop. I have also compared the performance
> >>of
> >>MongoDB and Memcache. I will let you know the result after I try the
> >>MapFile
> >>approach.
> >>
> >>Shi
> >>
> >>On 2010-10-12 21:59, M. C. Srivas wrote:
> >>
> >>
> On Tue, Oct 12, 2010 at 4:50 AM, Shi Yu
>   wrote:
> 
> 
> 
> 
> >Hi,
> >
> >I want to load a serialized HashMap object in hadoop. The file of
> >stored
> >object is 200M. I could read that object efficiently in JAVA by
> >setting
> >
> >
> >
> -Xmx
> 
> 
> 
> >as 1000M.  However, in hadoop I could never load it into memory. The
> >code
> >
> >
> >
> is
> 
> 
> 
> >very simple (just read the ObjectInputStream) and there is yet no
> >
> >
> >
> map/reduce
> 
> 
> 
> >implemented.  I set the  mapred.child.java.opts=-Xmx3000M, still get
> >the
> >"java.lang.OutOfMemoryError: Java heap space"  Could anyone explain
> >a
> >
> >
> >
> little
> 
> 
> 
> >bit how memory is allocate to JVM in hadoop. Why hadoop takes up so
> >much
> >memory?  If a program requires 1G memory on a single node, how much
> >
> >
> >
> memory
> 
> 
> 
> >it requires (generally) in Hadoop?
> >
> >
> >
> 
> >>>The JVM reserves swap space in advance, at the time of launching the
> >>>process. If your swap is too low (or do not have any swap configured),
> >>>you
> >>>will hit this.
> >>>
> >>>Or, you are on a 32-bit ma

Re: NullPointerException (Text.java:388)

2010-10-14 Thread Konstantin Boudnik

I have quickly looked if a similar bug has been filed already and couldn't
find one. Do you mind opening a JIRA for this?

Thanks,
  Cos

On Thu, Oct 14, 2010 at 02:42PM, Vitaliy Semochkin wrote:
> Hi,
> 
> during map phase I recieved following expcetion
> 
> java.lang.NullPointerException
>   at org.apache.hadoop.io.Text.encode(Text.java:388)
>   at org.apache.hadoop.io.Text.encode(Text.java:369)
>   at org.apache.hadoop.io.Text.writeString(Text.java:409)
>   at org.apache.hadoop.mapreduce.Counter.write(Counter.java:77)
>   at org.apache.hadoop.mapred.Counters$Group.write(Counters.java:311)
>   at org.apache.hadoop.mapred.Counters.write(Counters.java:491)
>   at org.apache.hadoop.mapred.TaskStatus.write(TaskStatus.java:370)
>   at 
> org.apache.hadoop.io.ObjectWritable.writeObject(ObjectWritable.java:159)
>   at org.apache.hadoop.ipc.RPC$Invocation.write(RPC.java:112)
>   at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:475)
>   at org.apache.hadoop.ipc.Client.call(Client.java:721)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>   at org.apache.hadoop.mapred.$Proxy0.statusUpdate(Unknown Source)
>   at org.apache.hadoop.mapred.Task.statusUpdate(Task.java:705)
>   at org.apache.hadoop.mapred.Task.sendLastUpdate(Task.java:729)
>   at org.apache.hadoop.mapred.Task.done(Task.java:695)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:309)
>   at org.apache.hadoop.mapred.Child.main(Child.java:170)
> 
> from the stack trace I see that the exception happens in hadoop code
> 
> encode(Text.java:388) -  ByteBuffer bytes =
> encoder.encode(CharBuffer.wrap(string.toCharArray()));
> 
> I guess string was null, but what could cause such argument during map?
> Can it be an empty file or emtpy line?
> 
> Thanks in Advance,
> Vitaliy S

Re: http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom: invalid sha1:

2010-10-30 Thread Konstantin Boudnik

I assume you're trying to build 0.20+. Later projects uses later version of
junit... Running the build...

[ivy:resolve] downloading 
http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.jar ...
[ivy:resolve] 

 (118kB)
[ivy:resolve] .. (0kB)
[ivy:resolve]   [SUCCESSFUL ] junit#junit;3.8.1!junit.jar (1486ms)

Looks like your ivy cache is hosed or something similar.
  Cos

On Sat, Oct 30, 2010 at 06:43PM, bharath v wrote:
> Hi ,
> 
> I am getting the error
> 
> http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom:
> invalid sha1:  ..
> 
> Is it downloading the corrupt file or is there any other thing which I
> need to take care of ??
> 
> Iam getting the same error even after several tries ..:/
> 
> Any help?
> 
> Thanks

Re: http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom: invalid sha1:

2010-10-30 Thread Konstantin Boudnik

You can at least try to clean you local Ivy cache for junit artifacts
(i.e. rm -rf ~/.ivy/junit*). That'd be a pretty reasonable first step.

On Sun, Oct 31, 2010 at 11:33AM, bharath v wrote:
> Hi ,
> 
> 
> Thanks for your reply .
> 
> I am using had-0.20.0 . I am new to ivy thingy .. So is there any
> solution to this ?
> 
> Thanks
> 
> 
> On Sun, Oct 31, 2010 at 3:42 AM, Konstantin Boudnik  wrote:
> > I assume you're trying to build 0.20+. Later projects uses later version of
> > junit... Running the build...
> >
> > [ivy:resolve] downloading 
> > http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.jar ...
> > [ivy:resolve] 
> > 
> >  (118kB)
> > [ivy:resolve] .. (0kB)
> > [ivy:resolve] ═ [SUCCESSFUL ] junit#junit;3.8.1!junit.jar (1486ms)
> >
> > Looks like your ivy cache is hosed or something similar.
> > ═Cos
> >
> > On Sat, Oct 30, 2010 at 06:43PM, bharath v wrote:
> >> Hi ,
> >>
> >> I am getting the error
> >>
> >> http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom:
> >> invalid sha1: ═..
> >>
> >> Is it downloading the corrupt file or is there any other thing which I
> >> need to take care of ??
> >>
> >> Iam getting the same error even after several tries ..:/
> >>
> >> Any help?
> >>
> >> Thanks
> >

Re: [SOLVED] Re: http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom: invalid sha1:

2010-10-31 Thread Konstantin Boudnik

That might be an indication that your original ivy jar was corrupt in some
way. As an experiment you might want try to revert the changes to
library.properties, remove ivy jars and run the build process once more.

The fact that I (and seemingly many other folks) are able to work with
previous version suggests that older Ivy is Ok.

Cos

On Sun, Oct 31, 2010 at 12:55PM, bharath v wrote:
> It is a problem with the ivy jar ... Change the version of ivy to
> ivy-2.2.0 in library.properties file and ant downloads the correct
> version from maven repo and the build happens correctly  ..
> 
>  Hope this helps some ppl who are facing the same problems .
> 
> :)
> 
> On Sun, Oct 31, 2010 at 12:22 PM, bharath vissapragada
>  wrote:
> > Hi ,
> >
> >
> > I did that already and it is now working ... I removed the entire ivy2
> > (~/.ivy2/* ) directory
> > (My ivy folder is .ivy2 and not .ivy .. Any problem with this ?)
> >
> > Now I am getting more errors (as I have cleaned the entire cache) as 
> > follows..
> >
> >
> > Any problem with the ivy/ant version ? ( I read in some forums abt the
> > related errors)
> >
> > My ant version is 1.7.0 ..
> >
> >
> > --ERROR---
> >
> > clover:
> > proxy.setup:
> > ivy-download:
> >      [get] Getting:
> > http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.0.0-rc2/ivy-2.0.0-rc2.jar
> >      [get] To: /home/rip/workspace/hadoop/ivy/ivy-2.0.0-rc2.jar
> >      [get] Not modified - so not downloaded
> > ivy-init-dirs:
> > ivy-probe-antlib:
> > ivy-init-antlib:
> > ivy-init:
> > [ivy:configure] :: Ivy 2.0.0-rc2 - 20081028224207 ::
> > http://ant.apache.org/ivy/ ::
> > :: loading settings :: file = /home/rip/workspace/hadoop/ivy/ivysettings.xml
> > ivy-resolve-common:
> > [ivy:resolve] :: resolving dependencies ::
> > org.apache.hadoop#Hadoop;work...@cloud
> > [ivy:resolve]   confs: [common]
> > [ivy:resolve]   found log4j#log4j;1.2.15 in maven2
> > [ivy:resolve]   found xmlenc#xmlenc;0.52 in maven2
> > [ivy:resolve]   found net.java.dev.jets3t#jets3t;0.6.1 in maven2
> > [ivy:resolve]   found commons-net#commons-net;1.4.1 in maven2
> > [ivy:resolve]   found org.mortbay.jetty#servlet-api-2.5;6.1.14 in maven2
> > [ivy:resolve]   found org.mortbay.jetty#jetty;6.1.14 in maven2
> > [ivy:resolve]   found org.mortbay.jetty#jetty-util;6.1.14 in maven2
> > [ivy:resolve]   found tomcat#jasper-runtime;5.5.12 in maven2
> > [ivy:resolve]   found tomcat#jasper-compiler;5.5.12 in maven2
> > [ivy:resolve]   found commons-el#commons-el;1.0 in maven2
> > [ivy:resolve]   found org.slf4j#slf4j-api;1.4.3 in maven2
> > [ivy:resolve]   found org.eclipse.jdt#core;3.1.1 in maven2
> > [ivy:resolve]   found org.slf4j#slf4j-log4j12;1.4.3 in maven2
> > [ivy:resolve] :: resolution report :: resolve 1236ms :: artifacts dl 43ms
> >        -
> >        |                  |            modules            ||   artifacts   |
> >        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
> >        -
> >        |      common      |   19  |   0   |   0   |   0   ||   13  |   0   |
> >        -
> > [ivy:resolve] :: problems summary ::
> > [ivy:resolve]  WARNINGS
> > [ivy:resolve]   problem while downloading module descriptor:
> > http://repo1.maven.org/maven2/commons-logging/commons-logging/1.0.4/commons-logging-1.0.4.pom:
> > invalid sha1: expected=�
> >
> > 
> >
> >
> > On Sun, Oct 31, 2010 at 12:06 PM, Konstantin Boudnik  
> > wrote:
> >> You can at least try to clean you local Ivy cache for junit artifacts
> >> (i.e. rm -rf ~/.ivy/junit*). That'd be a pretty reasonable first step.
> >>
> >> On Sun, Oct 31, 2010 at 11:33AM, bharath v wrote:
> >>> Hi ,
> >>>
> >>>
> >>> Thanks for your reply .
> >>>
> >>> I am using had-0.20.0 . I am new to ivy thingy .. So is there any
> >>> solution to this ?
> >>>
> >>> Thanks
> >>>
> >>>
> >>> On Sun, Oct 31, 2010 at 3:42 AM, Konstantin Boudnik  
> >>> wrote:
> >>> > I assume you're trying to build 0.20+. Later projects uses later 
> >>> > version o

Re: [SOLVED] Re: http://repo1.maven.org/maven2/junit/junit/3.8.1/junit-3.8.1.pom: invalid sha1:

2010-10-31 Thread Konstantin Boudnik

You can either:
 - build it for yourself (I wouldn't recommend it unless you know whatca doing)
 - get ready Apache realease:
   http://hadoop.apache.org/common/releases.html
   http://hadoop.apache.org/hdfs/releases.html
   http://hadoop.apache.org/mapreduce/releases.html
 - get Cloudera or Y! distibutions from
   http://www.cloudera.com/downloads/
   or
   http://developer.yahoo.com/hadoop/
former also has commercial support. There's also IBM distro but I don't know
anything about it except that there's one.

Hope it helps
  Cos

On Mon, Nov 01, 2010 at 10:10AM, Thiwanka Somasiri wrote:
> Hi all,
>I am new to Hadoop and need to use Mahout for my final year project. How
> can i download Apache Hadoop?
> Thanks.


signature.asc
Description: Digital signature

Re: 0.21 found interface but class was expected

2010-11-13 Thread Konstantin Boudnik

As much as I love ranting I can't help but wonder if there were any promises
to make 0.21+ be backward compatible with <0.20 ?

Just curious?

On Sat, Nov 13, 2010 at 02:50PM, Steve Lewis wrote:
> I have a long rant at http://lordjoesoftware.blogspot.com/ on this but
> the moral is that there seems to have been a deliberate decision that  0,20
> code will may not be comparable with -
> I have NEVER seen a major library so directly abandon backward compatability
> 
> 
> On Fri, Nov 12, 2010 at 8:04 AM, Sebastian Schoenherr <
> sebastian.schoenh...@student.uibk.ac.at> wrote:
> 
> > Hi Steve,
> > we had a similar problem. We've compiled our code with version 0.21 but
> > included the wrong jars into the classpath. (version 0.20.2;
> > NInputFormat.java). It seems that Hadoop changed this class to an interface,
> > maybe you've a simliar problem.
> > Hope this helps.
> > Sebastian
> >
> >
> > Zitat von Steve Lewis :
> >
> >
> >  Cassandra sees this error with 0.21 of hadoop
> >>
> >> Exception in thread "main" java.lang.IncompatibleClassChangeError: Found
> >> interface org.apache.hadoop.mapreduce.JobContext, but class was expected
> >>
> >> I see something similar
> >> Error: Found interface org.apache.hadoop.mapreduce.TaskInputOutputContext,
> >> but class was expected
> >>
> >> I find this especially puzzling
> >> since org.apache.hadoop.mapreduce.TaskInputOutputContext IS a class not an
> >> interface
> >>
> >> Does anyone have bright ideas???
> >>
> >> --
> >> Steven M. Lewis PhD
> >> 4221 105th Ave Ne
> >> Kirkland, WA 98033
> >> 206-384-1340 (cell)
> >> Institute for Systems Biology
> >> Seattle WA
> >>
> >>
> >
> >
> >
> 
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave Ne
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Institute for Systems Biology
> Seattle WA


signature.asc
Description: Digital signature

Re: 0.21 found interface but class was expected

2010-11-13 Thread Konstantin Boudnik

It doesn't answer my question. I guess I will have to look for the answer 
somewhere else

On Sat, Nov 13, 2010 at 03:22PM, Steve Lewis wrote:
> Java libraries are VERY reluctant to change major classes in a way that
> breaks backward compatability -
> NOTE that while the 0.18 packages are  deprecated, they are separate from
> the 0.20 packages allowing
> 0.18 code to run on 0.20 systems - this is true of virtually all Java
> libraries
> 
> On Sat, Nov 13, 2010 at 3:08 PM, Konstantin Boudnik  wrote:
> 
> > As much as I love ranting I can't help but wonder if there were any
> > promises
> > to make 0.21+ be backward compatible with <0.20 ?
> >
> > Just curious?
> >
> > On Sat, Nov 13, 2010 at 02:50PM, Steve Lewis wrote:
> > > I have a long rant at http://lordjoesoftware.blogspot.com/ on this but
> > > the moral is that there seems to have been a deliberate decision that
> >  0,20
> > > code will may not be comparable with -
> > > I have NEVER seen a major library so directly abandon backward
> > compatability
> > >
> > >
> > > On Fri, Nov 12, 2010 at 8:04 AM, Sebastian Schoenherr <
> > > sebastian.schoenh...@student.uibk.ac.at> wrote:
> > >
> > > > Hi Steve,
> > > > we had a similar problem. We've compiled our code with version 0.21 but
> > > > included the wrong jars into the classpath. (version 0.20.2;
> > > > NInputFormat.java). It seems that Hadoop changed this class to an
> > interface,
> > > > maybe you've a simliar problem.
> > > > Hope this helps.
> > > > Sebastian
> > > >
> > > >
> > > > Zitat von Steve Lewis :
> > > >
> > > >
> > > >  Cassandra sees this error with 0.21 of hadoop
> > > >>
> > > >> Exception in thread "main" java.lang.IncompatibleClassChangeError:
> > Found
> > > >> interface org.apache.hadoop.mapreduce.JobContext, but class was
> > expected
> > > >>
> > > >> I see something similar
> > > >> Error: Found interface
> > org.apache.hadoop.mapreduce.TaskInputOutputContext,
> > > >> but class was expected
> > > >>
> > > >> I find this especially puzzling
> > > >> since org.apache.hadoop.mapreduce.TaskInputOutputContext IS a class
> > not an
> > > >> interface
> > > >>
> > > >> Does anyone have bright ideas???
> > > >>
> > > >> --
> > > >> Steven M. Lewis PhD
> > > >> 4221 105th Ave Ne
> > > >> Kirkland, WA 98033
> > > >> 206-384-1340 (cell)
> > > >> Institute for Systems Biology
> > > >> Seattle WA
> > > >>
> > > >>
> > > >
> > > >
> > > >
> > >
> > >
> > > --
> > > Steven M. Lewis PhD
> > > 4221 105th Ave Ne
> > > Kirkland, WA 98033
> > > 206-384-1340 (cell)
> > > Institute for Systems Biology
> > > Seattle WA
> >
> > -BEGIN PGP SIGNATURE-
> > Version: GnuPG v1.4.10 (GNU/Linux)
> >
> > iF4EAREIAAYFAkzfGnwACgkQenyFlstYjhK6RwD+IdUVZuqXACV9+9By7fMiy/MO
> > Uxyt4o4Z4naBzvjMu0cBAMkHLuHFHxuM5Yzb7doeC8eAzq+brsBzVHDKGeUD5FG4
> > =dr5x
> > -END PGP SIGNATURE-
> >
> >
> 
> 
> -- 
> Steven M. Lewis PhD
> 4221 105th Ave Ne
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Institute for Systems Biology
> Seattle WA


signature.asc
Description: Digital signature

Re: 0.21 found interface but class was expected

2010-11-13 Thread Konstantin Boudnik

Oh, thank you Todd! For a second there I thought that Hadoop developers have
promised a full binary compatibility (in true Solaris sense of the word). 

Now I understand that such thing never been promised. Even though Hadoop
haven't come over 'major' version change yet.

Seriously. Steve, you are talking about leaving and breathing system here. To
best of my understanding first stable Hadoop version was suppose to be 1.0 - a
major version according to your own terms. Which apparently hasn't came around
yet.

Now, what exactly you are frustrated about?
  Cos

On Sat, Nov 13, 2010 at 06:50PM, Todd Lipcon wrote:
> We do have policies against breaking APIs between consecutive major versions
> except for very rare exceptions (eg UnixUserGroupInformation went away when
> security was added).
> 
> We do *not* have any current policies that existing code can work against
> different major versions without a recompile in between. Switching an
> implementation class to an interface is a case where a simple recompile of
> the dependent app should be sufficient to avoid issues. For whatever reason,
> the JVM bytecode for invoking an interface method (invokeinterface) is
> different than invoking a virtual method in a class (invokevirtual).
> 
> -Todd
> 
> On Sat, Nov 13, 2010 at 5:28 PM, Lance Norskog  wrote:
> 
> > It is considered good manners :)
> >
> > Seriously, if you want to attract a community you have an obligation
> > to tell them when you're going to jerk the rug out from under their
> > feet.
> >
> > On Sat, Nov 13, 2010 at 3:27 PM, Konstantin Boudnik 
> > wrote:
> > > It doesn't answer my question. I guess I will have to look for the answer
> > somewhere else
> > >
> > > On Sat, Nov 13, 2010 at 03:22PM, Steve Lewis wrote:
> > >> Java libraries are VERY reluctant to change major classes in a way that
> > >> breaks backward compatability -
> > >> NOTE that while the 0.18 packages are  deprecated, they are separate
> > from
> > >> the 0.20 packages allowing
> > >> 0.18 code to run on 0.20 systems - this is true of virtually all Java
> > >> libraries
> > >>
> > >> On Sat, Nov 13, 2010 at 3:08 PM, Konstantin Boudnik 
> > wrote:
> > >>
> > >> > As much as I love ranting I can't help but wonder if there were any
> > >> > promises
> > >> > to make 0.21+ be backward compatible with <0.20 ?
> > >> >
> > >> > Just curious?
> > >> >
> > >> > On Sat, Nov 13, 2010 at 02:50PM, Steve Lewis wrote:
> > >> > > I have a long rant at http://lordjoesoftware.blogspot.com/ on this
> > but
> > >> > > the moral is that there seems to have been a deliberate decision
> > that
> > >> >  0,20
> > >> > > code will may not be comparable with -
> > >> > > I have NEVER seen a major library so directly abandon backward
> > >> > compatability
> > >> > >
> > >> > >
> > >> > > On Fri, Nov 12, 2010 at 8:04 AM, Sebastian Schoenherr <
> > >> > > sebastian.schoenh...@student.uibk.ac.at> wrote:
> > >> > >
> > >> > > > Hi Steve,
> > >> > > > we had a similar problem. We've compiled our code with version
> > 0.21 but
> > >> > > > included the wrong jars into the classpath. (version 0.20.2;
> > >> > > > NInputFormat.java). It seems that Hadoop changed this class to an
> > >> > interface,
> > >> > > > maybe you've a simliar problem.
> > >> > > > Hope this helps.
> > >> > > > Sebastian
> > >> > > >
> > >> > > >
> > >> > > > Zitat von Steve Lewis :
> > >> > > >
> > >> > > >
> > >> > > >  Cassandra sees this error with 0.21 of hadoop
> > >> > > >>
> > >> > > >> Exception in thread "main"
> > java.lang.IncompatibleClassChangeError:
> > >> > Found
> > >> > > >> interface org.apache.hadoop.mapreduce.JobContext, but class was
> > >> > expected
> > >> > > >>
> > >> > > >> I see something similar
> > >> > > >> Error: Found interface
> > >> > org.apache.hadoop.mapreduce.TaskInputOutputContext,
> > >> > > >> but class was expected
>

Re: How to debug (log4j.properties),

2010-11-23 Thread Konstantin Boudnik

Line like this 
  log4j.logger.org.apache.hadoop=DEBUG

works for 0.20.* and for 0.21+. Therefore it should work for all others :)

So, are you trying to see your program's debug or from Hadoop ?

--
  Cos

On Tue, Nov 23, 2010 at 05:59PM, Tali K wrote:
> 
> 
> 
> 
> I am trying to debug my map/reduce (Hadoop)  app with help of the logging.
> When I do grep -r in $HADOOP_HOME/logs/* 
> 
> There is no line with debug info found.
> I need your help.  What am I doing wrong?
> Thanks in advance,
> Tali
> 
> In my class I put :
> 
> import org.apache.commons.logging.Log;
> import org.apache.commons.logging.LogFactory;
> 
> 
> LOG.warn("==");
> System.out.println("");
> _
> 
> 
> Here is my Log4j.properties:
> 
> log4j.rootLogger=WARN, stdout, logfile
> 
> log4j.appender.stdout=org.apache.log4j.ConsoleAppender
> log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
> log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - <%m>%n
> 
> log4j.appender.logfile=org.apache.log4j.RollingFileAppender
> log4j.appender.logfile.File=app-debug.log
> 
> 
> #log4j.appender.logfile.MaxFileSize=512KB
> # Keep three backup files.
> log4j.appender.logfile.MaxBackupIndex=3
> # Pattern to output: date priority [category] - message
> log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
> log4j.appender.logfile.layout.ConversionPattern=MYLINE %d %p [%c] - %m%n
> log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
> 
>

Re: HDFS and libhfds

2010-12-07 Thread Konstantin Boudnik

It is seems that you're trying to run ant with java5. Make sure your
JAVA_HOME is set properly.
--
  Take care,
Konstantin (Cos) Boudnik



2010/12/7 Petrucci Andreas :
>
> hello there, im trying to compile libhdfs in order  but there are some 
> problems. According to http://wiki.apache.org/hadoop/MountableHDFS  i have 
> already installes fuse. With ant compile-c++-libhdfs -Dlibhdfs=1 the buils is 
> successful.
>
> However when i try ant package -Djava5.home=... -Dforrest.home=... the build 
> fails and the output is the below :
>
>  [exec]
>     [exec] Exception in thread "main" java.lang.UnsupportedClassVersionError: 
> Bad version number in .class file
>     [exec]     at java.lang.ClassLoader.defineClass1(Native Method)
>     [exec]     at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>     [exec]     at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>     [exec]     at java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>     [exec]     at java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>     [exec]     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>     [exec]     at java.security.AccessController.doPrivileged(Native Method)
>     [exec]     at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>     [exec]     at 
> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>     [exec]     at 
> org.apache.avalon.excalibur.logger.DefaultLogTargetFactoryManager.configure(DefaultLogTargetFactoryManager.java:113)
>     [exec]     at 
> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.java:201)
>     [exec]     at 
> org.apache.avalon.excalibur.logger.LogKitLoggerManager.setupTargetFactoryManager(LogKitLoggerManager.java:436)
>     [exec]     at 
> org.apache.avalon.excalibur.logger.LogKitLoggerManager.configure(LogKitLoggerManager.java:400)
>     [exec]     at 
> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.java:201)
>     [exec]     at 
> org.apache.cocoon.core.CoreUtil.initLogger(CoreUtil.java:607)
>     [exec]     at org.apache.cocoon.core.CoreUtil.init(CoreUtil.java:169)
>     [exec]     at org.apache.cocoon.core.CoreUtil.(CoreUtil.java:115)
>     [exec]     at 
> org.apache.cocoon.bean.CocoonWrapper.initialize(CocoonWrapper.java:128)
>     [exec]     at 
> org.apache.cocoon.bean.CocoonBean.initialize(CocoonBean.java:97)
>     [exec]     at org.apache.cocoon.Main.main(Main.java:310)
>     [exec] Java Result: 1
>     [exec]
>     [exec]   Copying broken links file to site root.
>     [exec]
>     [exec]
>     [exec] BUILD FAILED
>     [exec] /apache-forrest-0.8/main/targets/site.xml:175: Warning: Could not 
> find file /hadoop-0.20.2/src/docs/build/tmp/brokenlinks.xml to copy.
>     [exec]
>     [exec] Total time: 4 seconds
>
> BUILD FAILED
> /hadoop-0.20.2/build.xml:867: exec returned: 1
>
>
> any ideas what's wrong???
>

Re: HDFS and libhfds

2010-12-07 Thread Konstantin Boudnik

Feel free to update https://issues.apache.org/jira/browse/HDFS-1519 if
you find it suitable.


2010/12/7 Petrucci Andreas :
>
> thanks for the replies, this solved my problems
>
> http://mail-archives.apache.org/mod_mbox/hadoop-common-user/200909.mbox/%3c6f5c1d715b2da5498a628e6b9c124f040145221...@hasmsx504.ger.corp.intel.com%3e
>
> ...i think i should write a post in my blog about this night with hdfs, 
> libhdfs and fuse...
>
>> Date: Tue, 7 Dec 2010 22:44:39 -0700
>> Subject: Re: HDFS and libhfds
>> From: sudhir.vallamko...@icrossing.com
>> To: common-user@hadoop.apache.org
>>
>> I second Ed's answer. Try unistalling whatever you installed and start
>> fresh. Whenever I see this error when trying to installing a native bridge,
>> this solution always worked for me.
>>
>>
>> On 12/7/10 5:07 PM, "common-user-digest-h...@hadoop.apache.org"
>>  wrote:
>>
>> > From: Edward Capriolo 
>> > Date: Tue, 7 Dec 2010 17:22:03 -0500
>> > To: 
>> > Subject: Re: HDFS and libhfds
>> >
>> > 2010/12/7 Petrucci Andreas :
>> >>
>> >> hello there, im trying to compile libhdfs in order  but there are some
>> >> problems. According to http://wiki.apache.org/hadoop/MountableHDFS  i have
>> >> already installes fuse. With ant compile-c++-libhdfs -Dlibhdfs=1 the 
>> >> buils is
>> >> successful.
>> >>
>> >> However when i try ant package -Djava5.home=... -Dforrest.home=... the 
>> >> build
>> >> fails and the output is the below :
>> >>
>> >>  [exec]
>> >>     [exec] Exception in thread "main" 
>> >> java.lang.UnsupportedClassVersionError:
>> >> Bad version number in .class file
>> >>     [exec]     at java.lang.ClassLoader.defineClass1(Native Method)
>> >>     [exec]     at java.lang.ClassLoader.defineClass(ClassLoader.java:620)
>> >>     [exec]     at
>> >> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124)
>> >>     [exec]     at
>> >> java.net.URLClassLoader.defineClass(URLClassLoader.java:260)
>> >>     [exec]     at 
>> >> java.net.URLClassLoader.access$100(URLClassLoader.java:56)
>> >>     [exec]     at java.net.URLClassLoader$1.run(URLClassLoader.java:195)
>> >>     [exec]     at java.security.AccessController.doPrivileged(Native 
>> >> Method)
>> >>     [exec]     at 
>> >> java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>> >>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>> >>     [exec]     at
>> >> sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:268)
>> >>     [exec]     at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>> >>     [exec]     at
>> >> org.apache.avalon.excalibur.logger.DefaultLogTargetFactoryManager.configure(D
>> >> efaultLogTargetFactoryManager.java:113)
>> >>     [exec]     at
>> >> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> >> ava:201)
>> >>     [exec]     at
>> >> org.apache.avalon.excalibur.logger.LogKitLoggerManager.setupTargetFactoryMana
>> >> ger(LogKitLoggerManager.java:436)
>> >>     [exec]     at
>> >> org.apache.avalon.excalibur.logger.LogKitLoggerManager.configure(LogKitLogger
>> >> Manager.java:400)
>> >>     [exec]     at
>> >> org.apache.avalon.framework.container.ContainerUtil.configure(ContainerUtil.j
>> >> ava:201)
>> >>     [exec]     at
>> >> org.apache.cocoon.core.CoreUtil.initLogger(CoreUtil.java:607)
>> >>     [exec]     at org.apache.cocoon.core.CoreUtil.init(CoreUtil.java:169)
>> >>     [exec]     at 
>> >> org.apache.cocoon.core.CoreUtil.(CoreUtil.java:115)
>> >>     [exec]     at
>> >> org.apache.cocoon.bean.CocoonWrapper.initialize(CocoonWrapper.java:128)
>> >>     [exec]     at
>> >> org.apache.cocoon.bean.CocoonBean.initialize(CocoonBean.java:97)
>> >>     [exec]     at org.apache.cocoon.Main.main(Main.java:310)
>> >>     [exec] Java Result: 1
>> >>     [exec]
>> >>     [exec]   Copying broken links file to site root.
>> >>     [exec]
>> >>     [exec]
>> >>     [exec] BUILD FAILED
>> >>     [exec] /apache-forrest-0.8/main/targets/site.xml:175: Warning: Could 
>> >> not
>> >> find file /hadoop-0.20.2/src/docs/build/tmp/brokenlinks.xml to copy.
>> >>     [exec]
>> >>     [exec] Total time: 4 seconds
>> >>
>> >> BUILD FAILED
>> >> /hadoop-0.20.2/build.xml:867: exec returned: 1
>> >>
>> >>
>> >> any ideas what's wrong???
>> >>
>> >
>> > I never saw this usage:
>> > -Djava5.home
>> > Try
>> > export JAVA_HOME=/usr/java
>> >
>> > " Bad version number in .class file " means you are mixing and
>> > matching java versions somehow.
>>
>>
>> iCrossing Privileged and Confidential Information
>> This email message is for the sole use of the intended recipient(s) and may 
>> contain confidential and privileged information of iCrossing. Any 
>> unauthorized review, use, disclosure or distribution is prohibited. If you 
>> are not the intended recipient, please contact the sender by reply email and 
>> destroy all copies of the original message.
>>
>>
>

Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Konstantin Boudnik

it seems that you are looking at 2 different directories:

first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
second: ls -l  tmp/dir/hadoop-hadoop/dfs/hadoop
--
  Take care,
Konstantin (Cos) Boudnik



On Wed, Dec 8, 2010 at 14:19, Richard Zhang  wrote:
> would that be the reason that 54310 port is not open?
> I just used
> * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
> to open the port.
> But it seems the same erorr exists.
> Richard
> *
> On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang wrote:
>
>> Hi James:
>> I verified that I have the following permission set for the path:
>>
>> ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
>> total 4
>> drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
>> Thanks.
>> Richard
>>
>>
>>
>> On Wed, Dec 8, 2010 at 4:50 PM, james warren  wrote:
>>
>>> Hi Richard -
>>>
>>> First thing that comes to mind is a permissions issue.  Can you verify
>>> that
>>> your directories along the desired namenode path are writable by the
>>> appropriate user(s)?
>>>
>>> HTH,
>>> -James
>>>
>>> On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang >> >wrote:
>>>
>>> > Hi Guys:
>>> > I am just installation the hadoop 0.21.0 in a single node cluster.
>>> > I encounter the following error when I run bin/hadoop namenode -format
>>> >
>>> > 10/12/08 16:27:22 ERROR namenode.NameNode:
>>> > java.io.IOException: Cannot create directory
>>> > /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
>>> >        at
>>> >
>>> >
>>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
>>> >        at
>>> > org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
>>> >        at
>>> > org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
>>> >        at
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
>>> >        at
>>> >
>>> >
>>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
>>> >        at
>>> > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
>>> >
>>> >
>>> > Below is my core-site.xml
>>> >
>>> > 
>>> > 
>>> > 
>>> >  hadoop.tmp.dir
>>> >  /your/path/to/hadoop/tmp/dir/hadoop-${user.name}
>>> >  A base for other temporary directories.
>>> > 
>>> >
>>> > 
>>> >  fs.default.name
>>> >  hdfs://localhost:54310
>>> >  The name of the default file system.  A URI whose
>>> >  scheme and authority determine the FileSystem implementation.  The
>>> >  uri's scheme determines the config property (fs.SCHEME.impl) naming
>>> >  the FileSystem implementation class.  The uri's authority is used to
>>> >  determine the host, port, etc. for a filesystem.
>>> > 
>>> > 
>>> >
>>> >
>>> > Below is my hdfs-site.xml
>>> > *
>>> > 
>>> >
>>> > 
>>> >
>>> > 
>>> > 
>>> > 
>>> >  dfs.replication
>>> >  1
>>> >  Default block replication.
>>> >  The actual number of replications can be specified when the file is
>>> > created.
>>> >  The default is used if replication is not specified in create time.
>>> >  
>>> > 
>>> >
>>> > 
>>> >
>>> >
>>> > below is my mapred-site.xml:
>>> > 
>>> > 
>>> >
>>> > 
>>> >
>>> > 
>>> >
>>> > 
>>> > 
>>> >  mapred.job.tracker
>>> >  localhost:54311
>>> >  The host and port that the MapReduce job tracker runs
>>> >  at.  If "local", then jobs are run in-process as a single map
>>> >  and reduce task.
>>> >  
>>> > 
>>> >
>>> > 
>>> >
>>> >
>>> > Thanks.
>>> > Richard
>>> > *
>>> >
>>>
>>
>>
>

Re: urgent, error: java.io.IOException: Cannot create directory

2010-12-08 Thread Konstantin Boudnik

Yeah, I figured that match. What I was referring to is the ending of the paths:
.../hadoop-hadoop/dfs/name/current
.../hadoop-hadoop/dfs/hadoop
They are different
--
  Take care,
Konstantin (Cos) Boudnik



On Wed, Dec 8, 2010 at 15:55, Richard Zhang  wrote:
> Hi:
> "/your/path/to/hadoop"  represents the location where hadoop is installed.
> BTW, I believe this is a file writing permission problem. Because I use the
> same *-site.xml setting to install with root and it works.
> But when I use the dedicated user hadoop, it always introduces this problem.
>
> But I do created manually the directory path and grant with 755.
> Weird
> Richard.
>
> On Wed, Dec 8, 2010 at 6:51 PM, Konstantin Boudnik  wrote:
>
>> it seems that you are looking at 2 different directories:
>>
>> first post: /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
>> second: ls -l                              tmp/dir/hadoop-hadoop/dfs/hadoop
>> --
>>   Take care,
>> Konstantin (Cos) Boudnik
>>
>>
>>
>> On Wed, Dec 8, 2010 at 14:19, Richard Zhang 
>> wrote:
>> > would that be the reason that 54310 port is not open?
>> > I just used
>> > * iptables -A INPUT -p tcp --dport 54310 -j ACCEPT
>> > to open the port.
>> > But it seems the same erorr exists.
>> > Richard
>> > *
>> > On Wed, Dec 8, 2010 at 4:56 PM, Richard Zhang > >wrote:
>> >
>> >> Hi James:
>> >> I verified that I have the following permission set for the path:
>> >>
>> >> ls -l tmp/dir/hadoop-hadoop/dfs/hadoop
>> >> total 4
>> >> drwxr-xr-x 2 hadoop hadoop 4096 2010-12-08 15:56 current
>> >> Thanks.
>> >> Richard
>> >>
>> >>
>> >>
>> >> On Wed, Dec 8, 2010 at 4:50 PM, james warren  wrote:
>> >>
>> >>> Hi Richard -
>> >>>
>> >>> First thing that comes to mind is a permissions issue.  Can you verify
>> >>> that
>> >>> your directories along the desired namenode path are writable by the
>> >>> appropriate user(s)?
>> >>>
>> >>> HTH,
>> >>> -James
>> >>>
>> >>> On Wed, Dec 8, 2010 at 1:37 PM, Richard Zhang > >>> >wrote:
>> >>>
>> >>> > Hi Guys:
>> >>> > I am just installation the hadoop 0.21.0 in a single node cluster.
>> >>> > I encounter the following error when I run bin/hadoop namenode
>> -format
>> >>> >
>> >>> > 10/12/08 16:27:22 ERROR namenode.NameNode:
>> >>> > java.io.IOException: Cannot create directory
>> >>> > /your/path/to/hadoop/tmp/dir/hadoop-hadoop/dfs/name/current
>> >>> >        at
>> >>> >
>> >>> >
>> >>>
>> org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.clearDirectory(Storage.java:312)
>> >>> >        at
>> >>> >
>> org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1425)
>> >>> >        at
>> >>> >
>> org.apache.hadoop.hdfs.server.namenode.FSImage.format(FSImage.java:1444)
>> >>> >        at
>> >>> >
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.format(NameNode.java:1242)
>> >>> >        at
>> >>> >
>> >>> >
>> >>>
>> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1348)
>> >>> >        at
>> >>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1368)
>> >>> >
>> >>> >
>> >>> > Below is my core-site.xml
>> >>> >
>> >>> > 
>> >>> > 
>> >>> > 
>> >>> >  hadoop.tmp.dir
>> >>> >  /your/path/to/hadoop/tmp/dir/hadoop-${user.name}
>> >>> >  A base for other temporary directories.
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >  fs.default.name
>> >>> >  hdfs://localhost:54310
>> >>> >  The name of the default file system.  A URI whose
>> >>> >  scheme and authority determine the FileSystem implementation.  The
>> >>> >  uri's scheme determines the config property (fs.SCHEME.impl) naming
>> >>> >  the FileSystem implementation class.  The uri's authority is used to
>> >>> >  determine the host, port, etc. for a filesystem.
>> >>> > 
>> >>> > 
>> >>> >
>> >>> >
>> >>> > Below is my hdfs-site.xml
>> >>> > *
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >
>> >>> > 
>> >>> > 
>> >>> > 
>> >>> >  dfs.replication
>> >>> >  1
>> >>> >  Default block replication.
>> >>> >  The actual number of replications can be specified when the file is
>> >>> > created.
>> >>> >  The default is used if replication is not specified in create time.
>> >>> >  
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >
>> >>> >
>> >>> > below is my mapred-site.xml:
>> >>> > 
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >
>> >>> > 
>> >>> > 
>> >>> >  mapred.job.tracker
>> >>> >  localhost:54311
>> >>> >  The host and port that the MapReduce job tracker runs
>> >>> >  at.  If "local", then jobs are run in-process as a single map
>> >>> >  and reduce task.
>> >>> >  
>> >>> > 
>> >>> >
>> >>> > 
>> >>> >
>> >>> >
>> >>> > Thanks.
>> >>> > Richard
>> >>> > *
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Re: Mapreduce Exceptions with hadoop 0.20.2

2010-12-09 Thread Konstantin Boudnik

On Thu, Dec 9, 2010 at 19:55, Praveen Bathala  wrote:
> I did this
> prav...@praveen-desktop:~/hadoop/hadoop-0.20.2$ bin/hadoop dfsadmin
> -safemode leave
> Safe mode is OFF
> prav...@praveen-desktop:~/hadoop/hadoop-0.20.2$ bin/hadoop dfsadmin
> -safemode get
> Safe mode is OFF

This is not a configuration setting: this is only a runtime on/off
switch. Once you have restarted the cluster your NN will go into
safemode (for a number of reasons). TTs are made to quit if they can't
connect to HDFS after some timeout (60 seconds if I remember
correctly). Once your NN is back from its safemode you can safely
start MR daemons and everything should be just fine.

Simply put: be patient ;)


> and then I restarted my cluster and still I see the INFO in namenode logs
> saying in safemode..
>
> somehow I am getting my Map output fine, but the job.isSuccessful() is
> returning false.
>
> Any help on that.
>
> Thanks
> + Praveen
>
> On Thu, Dec 9, 2010 at 9:28 PM, Mahadev Konar  wrote:
>
>> Hi Praveen,
>>  Looks like its your namenode that's still in safemode.
>>
>>
>> http://wiki.apache.org/hadoop/FAQ
>>
>> The safemode feature in the namenode waits till a certain number of
>> threshold for hdfs blocks have been reported by the datanodes,  before
>> letting clients making edits to the namespace. It usually happens when you
>> reboot your namenode. You can read more about the safemode in the above FAQ.
>>
>> Thanks
>> mahadev
>>
>>
>> On 12/9/10 6:09 PM, "Praveen Bathala"  wrote:
>>
>> Hi,
>>
>> I am running Mapreduce job to get some emails out of a huge text file.
>> I used to use hadoop 0.19 version and I had no issues, now I am using the
>> hadoop 0.20.2 and when I run my hadoop mapreduce job I see the log as job
>> failed and in the jobtracker log
>>
>> Can someone please help me..
>>
>> 2010-12-09 20:53:00,399 INFO org.apache.hadoop.mapred.JobTracker: problem
>> cleaning system directory:
>> hdfs://localhost:9000/home/praveen/hadoop/temp/mapred/system
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete
>> /home/praveen/hadoop/temp/mapred/system. Name node is in safe mode.
>> The ratio of reported blocks 0. has not reached the threshold 0.9990.
>> Safe mode will be turned off automatically.
>>        at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:1700)
>>        at
>>
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:1680)
>>        at
>> org.apache.hadoop.hdfs.server.namenode.NameNode.delete(NameNode.java:517)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>>        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>>        at java.security.AccessController.doPrivileged(Native Method)
>>        at javax.security.auth.Subject.doAs(Subject.java:396)
>>        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>>
>>  at org.apache.hadoop.ipc.Client.call(Client.java:740)
>>        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>>        at $Proxy4.delete(Unknown Source)
>>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>        at
>>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>        at
>>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>        at java.lang.reflect.Method.invoke(Method.java:597)
>>        at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>        at
>>
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>        at $Proxy4.delete(Unknown Source)
>>        at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:582)
>>        at
>>
>> org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:227)
>>        at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1695)
>>        at
>> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)
>>        at
>> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)
>>        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)
>> 2010-12-09 20:53:10,405 INFO org.apache.hadoop.mapred.JobTracker: Cleaning
>> up the system directory
>> 2010-12-09 20:53:10,409 INFO org.apache.hadoop.mapred.JobTracker: problem
>> cleaning system directory:
>> hdfs://localhost:9000/home/praveen/hadoop/temp/mapred/system
>>
>>
>> Thanks in advance
>> + Praveen
>>
>>
>
>
> --
> + Praveen
>

Re: Hadoop Certification Progamme

2010-12-15 Thread Konstantin Boudnik

Hey, commit rights won't give you a nice looking certificate, would it? ;)

On Wed, Dec 15, 2010 at 09:12, Steve Loughran  wrote:
> On 09/12/10 03:40, Matthew John wrote:
>>
>> Hi all,.
>>
>> Is there any valid Hadoop Certification available ? Something which adds
>> credibility to your Hadoop expertise.
>>
>
> Well, there's always providing enough patches to the code to get commit
> rights :)
>

Re: Hadoop Certification Progamme

2010-12-15 Thread Konstantin Boudnik

On Wed, Dec 15, 2010 at 09:28, James Seigel  wrote:
> But it would give you the right creds for people that you’d want to work for 
> :)

I believe you meant to say "you'd want to work _with_" ? 'cause from my
experience people you work _for_ care more about nice looking
certificates rather than real creds such as apache commit rights.

> James
>
>
> On 2010-12-15, at 10:26 AM, Konstantin Boudnik wrote:
>
>> Hey, commit rights won't give you a nice looking certificate, would it? ;)
>>
>> On Wed, Dec 15, 2010 at 09:12, Steve Loughran  wrote:
>>> On 09/12/10 03:40, Matthew John wrote:
>>>>
>>>> Hi all,.
>>>>
>>>> Is there any valid Hadoop Certification available ? Something which adds
>>>> credibility to your Hadoop expertise.
>>>>
>>>
>>> Well, there's always providing enough patches to the code to get commit
>>> rights :)
>>>
>
>

Re: Hadoop Certification Progamme

2010-12-15 Thread Konstantin Boudnik

On Wed, Dec 15, 2010 at 09:35, Steve Loughran  wrote:
> On 15/12/10 17:26, Konstantin Boudnik wrote:
>>
>> Hey, commit rights won't give you a nice looking certificate, would it? ;)
>>
>
> Depends on what hudson says about the quality of your patches. I mean, if
> every commit breaks the build, it soon becomes public

Right, the key words of my post were 'nice looking'.

Fwd: How to simulate network delay on 1 node

2010-12-26 Thread Konstantin Boudnik

Hi there.

What are looking at is fault injection.
I am not sure what version of Hadoop you're looking at, but here's at
what you take a look in 0.21 and forward:
  - Herriot system testing framework (which does code instrumentation
to add special APIs) on a real clusters. Here's some starting
pointers:
- source code is in src/test/system
- http://wiki.apache.org/hadoop/HowToUseSystemTestFramework
  - fault injection framework (should've been ported to 0.20 as well)
- Source code is under src/test/aop
- http://hadoop.apache.org/hdfs/docs/r0.21.0/faultinject_framework.html

If you are running on simulated infrastructure you don't need to look
further than fault injection framework. There's a test in HDFS which
does pretty much what you're looking for but for pipe-lines (look
under src/test/aop/org/apache/hadoop/hdfs/*).

If you are on a physical cluster then you need to use a combination of
1st and 2nd. The implementation of faults in system tests are coming
into Hadoop at some point of not very distant future, so you might
want to wait a little bit.
--
  Take care,
Konstantin (Cos) Boudnik

On Sun, Dec 26, 2010 at 04:25, yipeng  wrote:
> Hi everyone,
>
> I would like to simulate network delay on 1 node in my cluster, perhaps by
> putting the thread to sleep every time it transfers data non-locally. I'm
> looking at the source but am not sure where to place the code. Is there a
> better way to do it... a tool perhaps? Or could someone point me in the
> right direction?
>
> Cheers,
>
> Yipeng
>

Re: how to build hadoop in Linux

2010-12-30 Thread Konstantin Boudnik

The Java5 dependency is about to go from Hadoop. See HADOOP-7072. I
will try to commit it first thing next year. So, wait a couple of days
and you'll be all right.

Happy New Year everyone!


On Thu, Dec 30, 2010 at 22:08, Da Zheng  wrote:
> Hello,
>
> I need to build hadoop in Linux as I need to make some small changes in the
> code, but I don't know what is the simplest way to build hadoop. I googled it
> and so far I only found two places that tell how to build hadoop. One is
> http://bigdata.wordpress.com/2010/05/27/hadoop-cookbook-3-how-to-build-your-own-hadoop-distribution/.
> I downloaded apache forrest, and do as it
> ant -Djava5.home=/usr/lib/jvm/java-1.5.0-gcj-4.4/
> -Dforrest.home=/home/zhengda/apache-forrest-0.8 compile-core tar
> and get an error:
>     [exec] BUILD FAILED
>     [exec] /home/zhengda/apache-forrest-0.8/main/targets/validate.xml:158:
> java.lang.NullPointerException
> What does this error mean? it seems apache forrest is used to create hadoop
> document and I just want to rebuild hadoop java code. Is there a way for me to
> just rebuild java code? I ran "ant", it seems to work successfully, but I 
> don't
> know if it really compiled the code.
>
> the other place I found is to show how to build hadoop with eclipse. I use
> macbook and I have to ssh to linux boxes to work on hadoop, so it's not a very
> good option even if it can really work.
>
> Best,
> Da
>

Re: Entropy Pool and HDFS FS Commands Hanging System

2011-01-03 Thread Konstantin Boudnik

Another possibility to fix it is to install rng-tools which will allow
you to increase the amount of entropy in your system.
--
  Take care,
Konstantin (Cos) Boudnik



On Mon, Jan 3, 2011 at 16:48, Jon Lederman  wrote:
> Thanks.  Will try that.  One final question, based on the jstack output I 
> sent, is it obvious that the system is blocked due to the behavior of 
> /dev/random?  That is, can you enlighten me to the output I sent that 
> explicitly or implicitly indicates the blocking?  I am trying to understand 
> whether this is in fact the problem or whether there could be some other 
> issue.
>
> If I just let the FS command run (i.e., hadoop fs -ls), is there any 
> guarantee it will eventually return in some relatively finite period of time 
> such as hours, or could it potentially take days, weeks, years or eternity?
>
> Thanks in advance.
>
> -Jon
> On Jan 3, 2011, at 4:41 PM, Ted Dunning wrote:
>
>> try
>>
>>   dd if=/dev/random bs=1 count=100 of=/dev/null
>>
>> This will likely hang for a long time.
>>
>> There is no way that I know of to change the behavior of /dev/random except
>> by changing the file itself to point to a different minor device.  That
>> would be very bad form.
>>
>> One think you may be able do is to pour lots of entropy into the system via
>> /dev/urandom.  I was not able to demonstrate this, though, when I just tried
>> that.  It would be nice if there were a config variable to set that would
>> change this behavior, but right now, a code change is required (AFAIK).
>>
>> Another thing to do is replace the use of SecureRandom with a version that
>> uses /dev/urandom.  That is the point of the code that I linked to.  It
>> provides a plugin replacement that will not block.
>>
>> On Mon, Jan 3, 2011 at 4:31 PM, Jon Lederman  wrote:
>>
>>>
>>> Could you give me a bit more information on how I can overcome this issue.
>>> I am running Hadoop on an embedded processor and networking is turned off
>>> to the embedded processor. Is there a quick way to check whether this is in
>>> fact blocking on my system?  And, are there some variables or configuration
>>> options I can set to avoid any potential blocking behavior?
>>>
>>>
>
>

Re: Import data from mysql

2011-01-08 Thread Konstantin Boudnik

There's a supported tool with all bells and whistles:
  http://www.cloudera.com/downloads/sqoop/

--
  Take care,
Konstantin (Cos) Boudnik

On Sat, Jan 8, 2011 at 18:57, Sonal Goyal  wrote:
> Hi Brian,
>
> You can check HIHO at https://github.com/sonalgoyal/hiho which can help you
> load data from any JDBC database to the Hadoop file system. If your table
> has a date or id field, or any indicator for modified/newly added rows, you
> can import only the altered rows every day. Please let me know if you need
> help.
>
> Thanks and Regards,
> Sonal
> Connect Hadoop with databases,
> Salesforce, FTP servers and others 
> Nube Technologies 
>
> 
>
>
>
>
>
> On Sun, Jan 9, 2011 at 5:03 AM, Brian McSweeney
> wrote:
>
>> Hi folks,
>>
>> I'm a TOTAL newbie on hadoop. I have an existing webapp that has a growing
>> number of rows in a mysql database that I have to compare against one
>> another once a day from a batch job. This is an exponential problem as
>> every
>> row must be compared against every other row. I was thinking of
>> parallelizing this computation via hadoop. As such, I was thinking that
>> perhaps the first thing to look at is how to bring info from a database to
>> a
>> hadoop job and vise versa. I have seen the following relevant info
>>
>> https://issues.apache.org/jira/browse/HADOOP-2536
>>
>> and also
>>
>> http://architects.dzone.com/articles/tools-moving-sql-database
>>
>> any advice on what approach to use?
>>
>> cheers,
>> Brian
>>
>

Re: When applying a patch, which attachment should I use?

2011-01-10 Thread Konstantin Boudnik

Yeah, that's pretty crazy all right. In your case looks like that 3
patches on the top are the latest for 0.20-append branch, 0.21 branch
and trunk (which perhaps 0.22 branch at the moment). It doesn't look
like you need to apply all of them - just try the latest for your
particular branch.

The mess is caused by the fact the ppl are using different names for
consequent patches (as in file.1.patch, file.2.patch etc) This is
_very_ confusing indeed, especially when different contributors work
on the same fix/feature.
--
  Take care,
Konstantin (Cos) Boudnik


On Mon, Jan 10, 2011 at 01:10, edward choi  wrote:
> Hi,
> For the first time I am about to apply a patch to HDFS.
>
> https://issues.apache.org/jira/browse/HDFS-630
>
> Above is the one that I am trying to do.
> But there are like 15 patches and I don't know which one to use.
>
> Could anyone tell me if I need to apply them all or just the one at the top?
>
> The whole patching process is just so confusing :-(
>
> Ed
>

Re: Application for testing

2011-01-11 Thread Konstantin Boudnik

(Moving general@ to Bcc: list)

Bo, you can try to run TeraSort from Hadoop examples: you'll see if the
cluster is up and running and cen compare its performance between upgrades, if
needed.

Also, please don't use general@ for user questions: there's common-user@ list
exactly for these purposes.

With regards,
  Cos

On Tue, Jan 11, 2011 at 07:50AM, Bo Sang wrote:
> Hi, guys:
> 
> I have deployed a hadoop on our group's nodes. Could you recommend some
> typical applications for me? I want to test whether it can really work and
> observe its performance.
> 
> -- 
> Best Regards!
> 
> Sincerely
> Bo Sang

signature.asc
Description: Digital signature

Re: error compiling hadoop-mapreduce

2011-01-21 Thread Konstantin Boudnik

Bcc'ing common-user, adding mapreduce-user@ list instead. You have a
better chance to get your question answered if you send it to the
correct list.

For the answer see https://issues.apache.org/jira/browse/MAPREDUCE-2282
--
  Take care,
Konstantin (Cos) Boudnik



On Fri, Jan 21, 2011 at 09:08, Edson Ramiro  wrote:
> Hi all,
>
> I'm compiling hadoop from git using these instructions [1].
>
> The hadoop-common and hadoop-hdfs are okay, they compile without erros, but
> when I execute ant mvn-install to compile hadoop-mapreduce I get this error.
>
> compile-mapred-test:
>    [javac] /home/lbd/hadoop/hadoop-ramiro/hadoop-mapreduce/build.xml:602:
> warning: 'includeantruntime' was not set, defaulting to
> build.sysclasspath=last; set to false for repeatable builds
>    [javac] Compiling 179 source files to
> /home/lbd/hadoop/hadoop-ramiro/hadoop-mapreduce/build/test/mapred/classes
>    [javac]
> /home/lbd/hadoop/hadoop-ramiro/hadoop-mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMRServerPorts.java:84:
> cannot find symbol
>    [javac] symbol  : variable NAME_NODE_HOST
>    [javac]         TestHDFSServerPorts.NAME_NODE_HOST + "0");
>    [javac]                            ^
>    [javac]
> /home/lbd/hadoop/hadoop-ramiro/hadoop-mapreduce/src/test/mapred/org/apache/hadoop/mapred/TestMRServerPorts.java:86:
> cannot find symbol
>    [javac] symbol  : variable NAME_NODE_HTTP_HOST
>    [javac] location: class org.apache.hadoop.hdfs.TestHDFSServerPorts
>    [javac]         TestHDFSServerPorts.NAME_NODE_HTTP_HOST + "0");
>    [javac]                            ^
>    ...
>
> Is that a bug?
>
> This is my build.properties
>
> #this is essential
> resolvers=internal
> #you can increment this number as you see fit
> version=0.22.0-alpha-1
> project.version=${version}
> hadoop.version=${version}
> hadoop-core.version=${version}
> hadoop-hdfs.version=${version}
> hadoop-mapred.version=${version}
>
> Other question, Is the 0.22.0-alpha-1 the latest version?
>
> Thanks in advance,
>
> [1] https://github.com/apache/hadoop-mapreduce
>
> --
> Edson Ramiro Lucas Filho
> {skype, twitter, gtalk}: erlfilho
> http://www.inf.ufpr.br/erlf07/
>

Re: SSH problem in hadoop installation

2011-01-24 Thread Konstantin Boudnik

This has been discussed in great details here:
  
http://lmgtfy.com/?q=ssh_exchange_identification%3A+Connection+closed+by+remote+host
--
  Take care,
Konstantin (Cos) Boudnik




On Mon, Jan 24, 2011 at 22:07, real great..
 wrote:
> Hi,
> Am trying to install Hadoop on a linux cluster(Fedora 12).
> However, am not able to SSH to localhost and gives the following error.
>
> *ssh_exchange_identification: Connection closed by remote host*
>
> I know this is not the correct forum for asking this question. Yet it could
> solve a lot of my time if any of you could help me.
> Thanks,
>
>
>
> --
> Regards,
> R.V.
>

Re: MRUnit and Herriot

2011-02-02 Thread Konstantin Boudnik

(Moving to common-user where this belongs)

Herriot is system test framework which runs against a real physical
cluster deployed with a specially crafted build of Hadoop. That
instrumented build of provides an extra APIs not available in Hadoop
otherwise. These APIs are created to facilitate cluster software
testability. Herriot isn't limited by MR but also covered (although in
a somewhat lesser extend) HDFS side of Hadoop.

MRunit is for MR job "unit" testing as in making sure that your MR job
is ok and/or to allow you to debug it locally before scale deployment.

So, long story short - they are very different ;) Herriot can do
intricate fault injection and can work closely with a deployed cluster
(say control Hadoop nodes and daemons); MRUnit is focused on MR jobs
testing.

Hope it helps.
--
  Take care,
Konstantin (Cos) Boudnik

On Wed, Feb 2, 2011 at 05:44, Edson Ramiro  wrote:
> Hi all,
>
> Plz, could you explain me the difference between MRUnit and Herriot?
>
> I've read the documentation of both and they seem very similar to me.
>
> Is Herriot an evolution of MRUnit?
>
> What can Herriot do that MRUnit can't?
>
> Thanks in Advance
>
> --
> Edson Ramiro Lucas Filho
> {skype, twitter, gtalk}: erlfilho
> http://www.inf.ufpr.br/erlf07/
>

Re: MRUnit and Herriot

2011-02-03 Thread Konstantin Boudnik

Yes, Herriot can be used for integration tests of MR. Unit test is a very
different thing and normally is done against a 'unit of compilation' e.g. a
class, etc. Typically you won't expect to do unit tests against a deployed
cluster.

There is fault injection framework wich works at the level of functional tests
(with mini-clusters). Shortly we'll be opening an initial version of smoke and
integration test framework (maven and JUnit based).

It'd be easier to provide you with a hint if you care to explain what you're
trying to solve.

Cos

On Thu, Feb 03, 2011 at 10:25AM, Edson Ramiro wrote:
> Thank you a lot Konstantin, you cleared my mind.
> 
> So, Herriot is a framework designed to test Hadoop as a whole, and (IMHO) is
> a tool for help Hadoop developers and not for who is developing MR programs,
> but can we use Herriot to do unit, integration or other tests on our MR
> jobs?
> 
> Do you know another test tool or test framework for Hadoop?
> 
> Thanks in Advance
> 
> --
> Edson Ramiro Lucas Filho
> {skype, twitter, gtalk}: erlfilho
> http://www.inf.ufpr.br/erlf07/
> 
> 
> On Wed, Feb 2, 2011 at 4:58 PM, Konstantin Boudnik  wrote:
> 
> > (Moving to common-user where this belongs)
> >
> > Herriot is system test framework which runs against a real physical
> > cluster deployed with a specially crafted build of Hadoop. That
> > instrumented build of provides an extra APIs not available in Hadoop
> > otherwise. These APIs are created to facilitate cluster software
> > testability. Herriot isn't limited by MR but also covered (although in
> > a somewhat lesser extend) HDFS side of Hadoop.
> >
> > MRunit is for MR job "unit" testing as in making sure that your MR job
> > is ok and/or to allow you to debug it locally before scale deployment.
> >
> > So, long story short - they are very different ;) Herriot can do
> > intricate fault injection and can work closely with a deployed cluster
> > (say control Hadoop nodes and daemons); MRUnit is focused on MR jobs
> > testing.
> >
> > Hope it helps.
> > --
> >   Take care,
> > Konstantin (Cos) Boudnik
> >
> >
> > On Wed, Feb 2, 2011 at 05:44, Edson Ramiro  wrote:
> > > Hi all,
> > >
> > > Plz, could you explain me the difference between MRUnit and Herriot?
> > >
> > > I've read the documentation of both and they seem very similar to me.
> > >
> > > Is Herriot an evolution of MRUnit?
> > >
> > > What can Herriot do that MRUnit can't?
> > >
> > > Thanks in Advance
> > >
> > > --
> > > Edson Ramiro Lucas Filho
> > > {skype, twitter, gtalk}: erlfilho
> > > http://www.inf.ufpr.br/erlf07/
> > >
> >


signature.asc
Description: Digital signature

Re: MRUnit and Herriot

2011-02-07 Thread Konstantin Boudnik

On Mon, Feb 7, 2011 at 04:20, Edson Ramiro  wrote:
> Well, I'm studying the Hadoop test tools to evaluate some (if there are)
> deficiences, also trying to compare these tools to see what one cover that
> other doesn't and what is possible to do with each one.

There's also a simulated test cluster infrastructure called MiniDFS
and MiniMRCluster to allow you to develop functional tests without
actual cluster deployment.

> As far as I know we have just Herriot and MRUnit for test, and them do
> different things as you said me :)
>
> I'm very interested in your initial version, is there a link?

Not at the moment, but I will send it here as soon as a initial
version is pushed out.

>
> Thanks in advance
>
> --
> Edson Ramiro Lucas Filho
> {skype, twitter, gtalk}: erlfilho
> http://www.inf.ufpr.br/erlf07/
>
>
> On Fri, Feb 4, 2011 at 3:40 AM, Konstantin Boudnik  wrote:
>
>> Yes, Herriot can be used for integration tests of MR. Unit test is a very
>> different thing and normally is done against a 'unit of compilation' e.g. a
>> class, etc. Typically you won't expect to do unit tests against a deployed
>> cluster.
>>
>> There is fault injection framework wich works at the level of functional
>> tests
>> (with mini-clusters). Shortly we'll be opening an initial version of smoke
>> and
>> integration test framework (maven and JUnit based).
>>
>> It'd be easier to provide you with a hint if you care to explain what
>> you're
>> trying to solve.
>>
>> Cos
>>
>> On Thu, Feb 03, 2011 at 10:25AM, Edson Ramiro wrote:
>> > Thank you a lot Konstantin, you cleared my mind.
>> >
>> > So, Herriot is a framework designed to test Hadoop as a whole, and (IMHO)
>> is
>> > a tool for help Hadoop developers and not for who is developing MR
>> programs,
>> > but can we use Herriot to do unit, integration or other tests on our MR
>> > jobs?
>> >
>> > Do you know another test tool or test framework for Hadoop?
>> >
>> > Thanks in Advance
>> >
>> > --
>> > Edson Ramiro Lucas Filho
>> > {skype, twitter, gtalk}: erlfilho
>> > http://www.inf.ufpr.br/erlf07/
>> >
>> >
>> > On Wed, Feb 2, 2011 at 4:58 PM, Konstantin Boudnik 
>> wrote:
>> >
>> > > (Moving to common-user where this belongs)
>> > >
>> > > Herriot is system test framework which runs against a real physical
>> > > cluster deployed with a specially crafted build of Hadoop. That
>> > > instrumented build of provides an extra APIs not available in Hadoop
>> > > otherwise. These APIs are created to facilitate cluster software
>> > > testability. Herriot isn't limited by MR but also covered (although in
>> > > a somewhat lesser extend) HDFS side of Hadoop.
>> > >
>> > > MRunit is for MR job "unit" testing as in making sure that your MR job
>> > > is ok and/or to allow you to debug it locally before scale deployment.
>> > >
>> > > So, long story short - they are very different ;) Herriot can do
>> > > intricate fault injection and can work closely with a deployed cluster
>> > > (say control Hadoop nodes and daemons); MRUnit is focused on MR jobs
>> > > testing.
>> > >
>> > > Hope it helps.
>> > > --
>> > >   Take care,
>> > > Konstantin (Cos) Boudnik
>> > >
>> > >
>> > > On Wed, Feb 2, 2011 at 05:44, Edson Ramiro  wrote:
>> > > > Hi all,
>> > > >
>> > > > Plz, could you explain me the difference between MRUnit and Herriot?
>> > > >
>> > > > I've read the documentation of both and they seem very similar to me.
>> > > >
>> > > > Is Herriot an evolution of MRUnit?
>> > > >
>> > > > What can Herriot do that MRUnit can't?
>> > > >
>> > > > Thanks in Advance
>> > > >
>> > > > --
>> > > > Edson Ramiro Lucas Filho
>> > > > {skype, twitter, gtalk}: erlfilho
>> > > > http://www.inf.ufpr.br/erlf07/
>> > > >
>> > >
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1.4.10 (GNU/Linux)
>>
>> iF4EAREIAAYFAk1LkUYACgkQenyFlstYjhIyYwD9HM7YvfdcvBuqdN24No5T4dLe
>> lDLVlnEs8QIN4V7RqAYBAJ8liUG2YZ+c/wvWL3/lVAGY+Fqls0k4OYLG4rXJrwwD
>> =h/66
>> -END PGP SIGNATURE-
>>
>>
>

Re: hadoop infrastructure questions (production environment)

2011-02-09 Thread Konstantin Boudnik

On Wed, Feb 9, 2011 at 02:37, Steve Loughran  wrote:
> On 08/02/11 15:45, Oleg Ruchovets wrote:
...
>>    2)  Currently adding additional machine to the greed we need manually
>> maintain all files and configurations.
>>          Is it possible to auto-deploy hadoop servers without the need to
>> manually define each one on all nodes?
>
> That's the only way people do it in production clusters: you use
> Configuration Management (CM) tools. Which one you use is your choice, but
> do use one.

You can go with something like Chef or Puppet: these seem to be quite
popular among Hadoop ops nowadays.

Cos

Re: MRUnit and Herriot

2011-02-10 Thread Konstantin Boudnik

On Thu, Feb 10, 2011 at 08:39, Edson Ramiro  wrote:
> Hi,
>
> I took a look around on the Internet, but I didn't find any docs about
> MiniDFS
> and MiniMRCluster. Is there docs about them?
>
> It remember me this phrase I got from the Herriot [1] page.
> "As always your best source of information and knowledge about any software
> system is its source code" :)

Yes, this still holds ;) Source code is your best friend for a number
of reasons:
  - this is _the_ best documentation for the code and shows what an
application does
  - it is always up-to-date
  - developers can focus on their development/testing rather then
writing an end-user documents about some internals (which no-one but
other developers will ever need)

> Do you think is possible to have just one tool to cover all kinds of tests?

Sure, why not? I am also a big believer that a single OS would do just fine.

> Another question, do you know if is possible to evaluate a MR program, eg
> sort, with Herriot considering several test data?

Absolutely... Herriot does run work-loads against a physical clusters.
So, I don't see why it can be impossible. Would be most effective use
of your time? Perhaps not, because Herriot requires a specially
tailored (instrumented) cluster to be executed against.

What you need, I think, is a simple way to get a jar file containing
some tests, drop it to a cluster's gateway machine and run then. Looks
like as what we are trying to achieve in iTest I have mentioned
earlier.

Cos

> Thanks in Advance
>
> --
> Edson Ramiro Lucas Filho
> {skype, twitter, gtalk}: erlfilho
> http://www.inf.ufpr.br/erlf07/
>
>
> On Mon, Feb 7, 2011 at 10:29 PM, Konstantin Boudnik  wrote:
>
>> On Mon, Feb 7, 2011 at 04:20, Edson Ramiro  wrote:
>> > Well, I'm studying the Hadoop test tools to evaluate some (if there are)
>> > deficiences, also trying to compare these tools to see what one cover
>> that
>> > other doesn't and what is possible to do with each one.
>>
>> There's also a simulated test cluster infrastructure called MiniDFS
>> and MiniMRCluster to allow you to develop functional tests without
>> actual cluster deployment.
>>
>> > As far as I know we have just Herriot and MRUnit for test, and them do
>> > different things as you said me :)
>> >
>> > I'm very interested in your initial version, is there a link?
>>
>> Not at the moment, but I will send it here as soon as a initial
>> version is pushed out.
>>
>> >
>> > Thanks in advance
>> >
>> > --
>> > Edson Ramiro Lucas Filho
>> > {skype, twitter, gtalk}: erlfilho
>> > http://www.inf.ufpr.br/erlf07/
>> >
>> >
>> > On Fri, Feb 4, 2011 at 3:40 AM, Konstantin Boudnik 
>> wrote:
>> >
>> >> Yes, Herriot can be used for integration tests of MR. Unit test is a
>> very
>> >> different thing and normally is done against a 'unit of compilation'
>> e.g. a
>> >> class, etc. Typically you won't expect to do unit tests against a
>> deployed
>> >> cluster.
>> >>
>> >> There is fault injection framework wich works at the level of functional
>> >> tests
>> >> (with mini-clusters). Shortly we'll be opening an initial version of
>> smoke
>> >> and
>> >> integration test framework (maven and JUnit based).
>> >>
>> >> It'd be easier to provide you with a hint if you care to explain what
>> >> you're
>> >> trying to solve.
>> >>
>> >> Cos
>> >>
>> >> On Thu, Feb 03, 2011 at 10:25AM, Edson Ramiro wrote:
>> >> > Thank you a lot Konstantin, you cleared my mind.
>> >> >
>> >> > So, Herriot is a framework designed to test Hadoop as a whole, and
>> (IMHO)
>> >> is
>> >> > a tool for help Hadoop developers and not for who is developing MR
>> >> programs,
>> >> > but can we use Herriot to do unit, integration or other tests on our
>> MR
>> >> > jobs?
>> >> >
>> >> > Do you know another test tool or test framework for Hadoop?
>> >> >
>> >> > Thanks in Advance
>> >> >
>> >> > --
>> >> > Edson Ramiro Lucas Filho
>> >> > {skype, twitter, gtalk}: erlfilho
>> >> > http://www.inf.ufpr.br/erlf07/
>> >> >
>> >> >
>> >> > On Wed, Feb 2, 2011 at 4:58 PM, Konstantin Boudnik 
>> >> wrote:
>> &

Re: hadoop 0.20 append - some clarifications

2011-02-10 Thread Konstantin Boudnik

You might also want to check append design doc published at HDFS-265
--
  Take care,
Konstantin (Cos) Boudnik




On Thu, Feb 10, 2011 at 07:11, Gokulakannan M  wrote:
> Hi All,
>
> I have run the hadoop 0.20 append branch . Can someone please clarify the
> following behavior?
>
> A writer writing a file but he has not flushed the data and not closed the
> file. Could a parallel reader read this partial file?
>
> For example,
>
> 1. a writer is writing a 10MB file(block size 2 MB)
>
> 2. wrote the file upto 5MB (2 finalized blocks + 1 blockBeingWritten) . note
> that writer is not calling FsDataOutputStream sync( ) at all
>
> 3. now a reader tries to read the above partially written file
>
> I can be able to see that the reader can be able to see the partially
> written 5MB data but I feel the reader should be able to see the data only
> after the writer calls sync() api.
>
> Is this the correct behavior or my understanding is wrong?
>
>
>
>  Thanks,
>
>  Gokul
>
>

Re: question about CDH3

2011-02-15 Thread Konstantin Boudnik

Cross posts are bad
to: common-...@hadoop.apache.org,
cc  common-user@hadoop.apache.org,
Your urgency is understandable but sending a question to different
(and wrong lists)  won't help you.

First of all this is HDFS question.

Second of all for CDH related questions please use cdh-u...@cloudera.org list.
--
  Take care,
Konstantin (Cos) Boudnik

On Tue, Feb 15, 2011 at 22:15, springring  wrote:
> Hi,
>    I install CDH3 follow the mannul as attached file,
> but when I run the command
> "su -s /bin/bash -hdfs -c 'hadoop namenode -format'"
> on page 25, it show that "su: invalid option --h",
> so I change the comand to
> "su -s /bin/bash -hdfs -c'hadoop namenode -format'"
> the message is that
> "May not run daemons as root.Please specify HADOOP_NAMENODE_USER"
> So, is there any wrong in my operation?
> Thanks.
>
> Springring.Xu

Re: Hadoop in Real time applications

2011-02-17 Thread Konstantin Boudnik

'cause email is a soft real-time system.
A bank application would be a hard real-time system.

All the difference is in guarantees.
--
  Take care,
Konstantin (Cos) Boudnik

On Thu, Feb 17, 2011 at 05:22, Michael Segel  wrote:
>
> Uhm...
>
> 'Realtime' is relative.
>
> Facebook uses HBase for e-mail, right? Now isn't that a 'realtime' 
> application?
> ;-)
>
> If you're talking about realtime as in like a controller? Or a systems of 
> record for a stock exchange? That wouldn't be a good fit.
>
>
>> Date: Thu, 17 Feb 2011 17:26:04 +0530
>> Subject: Re: Hadoop in Real time applications
>> From: karthik84ku...@gmail.com
>> To: common-user@hadoop.apache.org
>>
>> Hi,
>>
>> Thanks for the clarification.
>>
>> On Thu, Feb 17, 2011 at 2:09 PM, Niels Basjes  wrote:
>>
>> > 2011/2/17 Karthik Kumar :
>> > > Can Hadoop be used for Real time Applications such as banking
>> > solutions...
>> >
>> > Hadoop consists of several components.
>> > Components like HDFS and HBase are quite suitable for "interactive"
>> > solutions (as in: I usually get an answer within 0.x seconds).
>> > If you really need "realtime" (as in: I want a guarantee that I have
>> > an answer within 0.x seconds) the answer is: No, HDFS/HBase cannot
>> > guarantee that.
>> > Other components like MapReduce (and Hive which run on top of
>> > MapReduce) are purely batch oriented.
>> >
>> > --
>> > Met vriendelijke groeten,
>> >
>> > Niels Basjes
>> >
>>
>>
>>
>> --
>> With Regards,
>> Karthik
>

Re: benchmark choices

2011-02-18 Thread Konstantin Boudnik

On Fri, Feb 18, 2011 at 14:35, Ted Dunning  wrote:
> I just read the malstone report.  They report times for a Java version that
> is many (5x) times slower than for a streaming implementation.  That single
> fact indicates that the Java code is so appallingly bad that this is a very
> bad benchmark.

Slow Java code? That's funny ;) Running with Hotspot on by any chance?

> On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout wrote:
>
>> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
>> data and the queries, if not the query generator. There is a Jira issue in
>> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry, I
>> don't remember the issue number offhand.
>>
>> -Original Message-
>> From: Shrinivas Joshi [mailto:jshrini...@gmail.com]
>> Sent: Friday, February 18, 2011 3:32 PM
>> To: common-user@hadoop.apache.org
>> Subject: benchmark choices
>>
>> Which workloads are used for serious benchmarking of Hadoop clusters? Do
>> you care about any of the following workloads :
>> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench, NNBench,
>> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
>>
>> Thanks,
>> -Shrinivas
>>
>>
>

Re: multiple hadoop instances on same cluster

2011-02-21 Thread Konstantin Boudnik

Make sure the instances' ports aren't conflicting and all directories
(NN, JT, etc.) are unique. That should do it.
--
  Take care,
Konstantin (Cos) Boudnik

On Mon, Feb 21, 2011 at 20:09, Gang Luo  wrote:
> Hello folks,
> I am trying to run multiple hadoop instances on the same cluster. I find it 
> hard
> to share. First I try two  instances, each of them run with the same master 
> and
> slaves. Only one of them could work. I try to divide the cluster such that
> hadoop 1 use machine 0-9 and hadoop 2 uses machine 10-19. Still, only one of
> them could work. The HDFS of the second hadoop is working well, but
> start-mapred.sh will result in such exception "java.io.IOException: Connection
> reset by peer" in the log.
>
>
> Any ideas on this or suggestion on how to run multiple hadoop instance on one
> cluster? I can total divide up the cluster such that different instances run 
> on
> different set of machines.
>
> Thanks.
>
> -Gang
>
>
>
>
>

Re: benchmark choices

2011-02-22 Thread Konstantin Boudnik

Adding Roman Shaposhnik to the list who's "tasked" with benchmarking @Cloudera

On Mon, Feb 21, 2011 at 12:39, Shrinivas Joshi  wrote:
> I wonder what companies like Amazon, Cloudera, RackSpace, Facebook, Yahoo
> etc. look at for the purpose of benchmarking. I guess GridMix v3 might be of
> more interest to Yahoo.
>
> I would appreciate if someone can comment more on this.
>
> Thanks,
> -Shrinivas
>
> On Fri, Feb 18, 2011 at 4:50 PM, Konstantin Boudnik  wrote:
>>
>> On Fri, Feb 18, 2011 at 14:35, Ted Dunning  wrote:
>> > I just read the malstone report.  They report times for a Java version
>> > that
>> > is many (5x) times slower than for a streaming implementation.  That
>> > single
>> > fact indicates that the Java code is so appallingly bad that this is a
>> > very
>> > bad benchmark.
>>
>> Slow Java code? That's funny ;) Running with Hotspot on by any chance?
>>
>> > On Fri, Feb 18, 2011 at 2:27 PM, Jim Falgout
>> > wrote:
>> >
>> >> We use MalStone and TeraSort. For Hive, you can use TPC-H, at least the
>> >> data and the queries, if not the query generator. There is a Jira issue
>> >> in
>> >> Hive that discusses the TPC-H "benchmark" if you're interested. Sorry,
>> >> I
>> >> don't remember the issue number offhand.
>> >>
>> >> -Original Message-
>> >> From: Shrinivas Joshi [mailto:jshrini...@gmail.com]
>> >> Sent: Friday, February 18, 2011 3:32 PM
>> >> To: common-user@hadoop.apache.org
>> >> Subject: benchmark choices
>> >>
>> >> Which workloads are used for serious benchmarking of Hadoop clusters?
>> >> Do
>> >> you care about any of the following workloads :
>> >> TeraSort, GridMix v1, v2, or v3, MalStone, CloudBurst, MRBench,
>> >> NNBench,
>> >> sample apps shipped with Hadoop distro like PiEstimator, dbcount etc.
>> >>
>> >> Thanks,
>> >> -Shrinivas
>> >>
>> >>
>> >
>
>

Re: Some question about fault Injection

2011-02-22 Thread Konstantin Boudnik

Hi Hao.

Yes, you should be able to instrument any part of Hadoop including
mapreduce daemons. A good examples of how to inject faults to Hadoop
are fault inject tests you can find in Hdfs (under
src/test/aop/org/apache/hadoop). I believe Mapreduce doesn't have any
fault injections tests yet for most of our focus until now was on HDFS
side.

Please keep in mind that you need to write your own aspectj
implementation of the faults (for the weaving) and the you have to
make sure that the instrumented jar files are deployed to the cluster
(e.g. the weaved faults are present in the actual cluster jar files)

Out of the three on second one is in aop.xml.
>% ant jar-fault-inject
>% ant jar-test-fault-inject
>% ant run-test-hdfs-fault-inject
The rest is located in the main build.xml. Pretty ugly I think but I
had to work around certain ant limitation and that was less harmful
workaround of all.

Please feel free to ask further questions if something won't be clear.
--
  Take care,
Konstantin (Cos) Boudnik



On Mon, Feb 21, 2011 at 05:30, Hao Zhu  wrote:
> Dear cos:
>
>      Nice to meet you.
>      I have a couple of questions about your great job: fault injection
> project.
>      First,  is this framework possible to inject a Node-failure fault into
> mapreduce? Because i would like to simulate node failure described in
> Google's paper in my cluster.
>           if it possible, please give me some clues to achieve that goal.
>      Second, i am a New to AOP. So, I just do not quite understand the
> following command written in your guide:
>            % ant jar-fault-inject
>            % ant jar-test-fault-inject
>            % ant run-test-hdfs-fault-inject
>            but i check out the file: hdfs/src/test/aop/build/aop.xml, i
> could not find out the corresponding code in that file.
>      I work a project to run gridmix workload on my cluster with some node's
> random failure. So the first question is really important to me.
>
>      Finally, look forward to hear from you.
> Best Regards
> Hao Zhu

Re: Hadoop Testing?

2011-03-17 Thread Konstantin Boudnik

We just have pushed an update of stack validation framework (the one
Roman and I have  presented at eBay a few weeks ago) which allows you
to formalize and simplify Hadoop testing. This is still a pre-Beta
stage (e.g. no user docs are ready yet), but it is working and has a
lot of merit in it as of now. This framework allows you to seamlessly
reuse a existing Hadoop oriented test-artifacts available from variety
of components such as Pig, Sqoop, Hadoop proper, Hive, etc.

Pl. check github.com/cloudera/iTest - more is coming daily. And feel
free to contribute ;)

--
  Take care,
Konstantin (Cos) Boudnik

On Thu, Mar 17, 2011 at 10:48, Anandkumar R  wrote:
> Dear Friends,
>
> I am Anandkumar, working as a test engineer in eBay and we use Hadoop
> extensively to store our log. I am in situation to validate or test our Data
> perfectly reaching the Hadoop infrastructure or not. could anyone of you
> recommend me the best testing methodologies and if there is any existing
> framework for testing Hadoop, please recommend to me.
>
> My scenario is simple of Client will dump millions of data to Hadoop, I need
> to validate that the data has reached Hadoop perfectly and also there is not
> Data loss and also other testing like scalability and reliability.
>
> Anticipating your support
>
> Thanks,
> Anandkumar
>

Re: How to insert some print codes into Hadoop?

2011-03-23 Thread Konstantin Boudnik

[Moving to common-user@, Bcc'ing general@]

If you know where you need to have your print statements you can use
AspectJ to do runtime injection of needed java code into desirable
spots. You don't need to even touch the source code for that - just
instrument (weave) the jar file
--
  Take care,
Konstantin (Cos) Boudnik

On Tue, Mar 22, 2011 at 09:19, Bo Sang  wrote:
> Hi, guys:
>
> I would like to do some minor modification of Hadoop (just to insert some
> print codes into particular places). And I have the following questions:
>
> 1. It seems there are three parts of Hadoop: common, hdfs, mapred. And they
> are packed as three independent jar packages. Could I only modify one part
> (eg, common) and pack a new jar package without modifying the rest two.
>
> 2. I have try to import the folder hadoop-0.21.0/common into eclipse as a
> project. But eclipse fails to recognize it as an existing project. But if I
> import folder hadoop-0.21.0 as a existing project, it works. However, I only
> want to modify common part. How could I only modify  common part and export
> a new common jar package without modifying the rest two parts.
>
> --
> Best Regards!
>
> Sincerely
> Bo Sang
>

Re: ant version problem

2011-03-26 Thread Konstantin Boudnik

This

Apache Ant version 1.8.0 compiled on February 1 2010

should be just fine. I think you need to have something later than 1.7.2 or so
--
  Take care,
Konstantin (Cos) Boudnik



On Sat, Mar 26, 2011 at 18:01, Daniel McEnnis  wrote:
> Dear Hadoop,
>
> Which version of ant do I need to keep the hadoop build from failing.
> Netbeans ant works as well as eclipse ant works. However, ant 1.8.2
> does not, nor does the
> default ant from Ubuntu 10.10.  Snippet from failure to follow:
>
> record-parser:
>
> compile-rcc-compiler:
>   [javac] /home/user/src/trunk/build.xml:333: warning:
> 'includeantruntime' was not set, defaulting to
> build.sysclasspath=last; set to false for repeatable builds
>
> BUILD FAILED
> /home/user/src/trunk/build.xml:338: taskdef A class needed by class
> org.apache.hadoop.record.compiler.ant.RccTask cannot be found: Task
>  using the classloader
> AntClassLoader[/home/user/src/trunk/build/classes:/home/user/src/trunk/conf:/home/user/.ivy2/cache/commons-logging/commons-logging/jars/commons-logging-1.1.1.jar:/home/user/.ivy2/cache/log4j/log4j/jars/log4j-1.2.15.jar:/home/user/.ivy2/cache/commons-httpclient/commons-httpclient/jars/commons-httpclient-3.1.jar:/home/user/.ivy2/cache/commons-codec/commons-codec/jars/commons-codec-1.4.jar:/home/user/.ivy2/cache/commons-cli/commons-cli/jars/commons-cli-1.2.jar:/home/user/.ivy2/cache/xmlenc/xmlenc/jars/xmlenc-0.52.jar:/home/user/.ivy2/cache/net.java.dev.jets3t/jets3t/jars/jets3t-0.7.1.jar:/home/user/.ivy2/cache/commons-net/commons-net/jars/commons-net-1.4.1.jar:/home/user/.ivy2/cache/org.mortbay.jetty/servlet-api-2.5/jars/servlet-api-2.5-6.1.14.jar:/home/user/.ivy2/cache/net.sf.kosmosfs/kfs/jars/kfs-0.3.jar:/home/user/.ivy2/cache/org.mortbay.jetty/jetty/jars/jetty-6.1.14.jar:/home/user/.ivy2/cache/org.mortbay.jetty/jetty-util/jars/jetty-util-6.1.14.jar:/home/user/.ivy2/cache/tomcat/jasper-runtime/jars/jasper-runtime-5.5.12.jar:/home/user/.ivy2/cache/tomcat/jasper-compiler/jars/jasper-compiler-5.5.12.jar:/home/user/.ivy2/cache/org.mortbay.jetty/jsp-api-2.1/jars/jsp-api-2.1-6.1.14.jar:/home/user/.ivy2/cache/org.mortbay.jetty/jsp-2.1/jars/jsp-2.1-6.1.14.jar:/home/user/.ivy2/cache/commons-el/commons-el/jars/commons-el-1.0.jar:/home/user/.ivy2/cache/oro/oro/jars/oro-2.0.8.jar:/home/user/.ivy2/cache/jdiff/jdiff/jars/jdiff-1.0.9.jar:/home/user/.ivy2/cache/junit/junit/jars/junit-4.8.1.jar:/home/user/.ivy2/cache/hsqldb/hsqldb/jars/hsqldb-1.8.0.10.jar:/home/user/.ivy2/cache/commons-logging/commons-logging-api/jars/commons-logging-api-1.1.jar:/home/user/.ivy2/cache/org.slf4j/slf4j-api/jars/slf4j-api-1.5.11.jar:/home/user/.ivy2/cache/org.eclipse.jdt/core/jars/core-3.1.1.jar:/home/user/.ivy2/cache/org.slf4j/slf4j-log4j12/jars/slf4j-log4j12-1.5.11.jar:/home/user/.ivy2/cache/org.apache.hadoop/avro/jars/avro-1.3.2.jar:/home/user/.ivy2/cache/org.codehaus.jackson/jackson-mapper-asl/jars/jackson-mapper-asl-1.4.2.jar:/home/user/.ivy2/cache/org.codehaus.jackson/jackson-core-asl/jars/jackson-core-asl-1.4.2.jar:/home/user/.ivy2/cache/com.thoughtworks.paranamer/paranamer/jars/paranamer-2.2.jar:/home/user/.ivy2/cache/com.thoughtworks.paranamer/paranamer-ant/jars/paranamer-ant-2.2.jar:/home/user/.ivy2/cache/com.thoughtworks.paranamer/paranamer-generator/jars/paranamer-generator-2.2.jar:/home/user/.ivy2/cache/com.thoughtworks.qdox/qdox/jars/qdox-1.10.1.jar:/home/user/.ivy2/cache/asm/asm/jars/asm-3.2.jar:/home/user/.ivy2/cache/commons-lang/commons-lang/jars/commons-lang-2.5.jar:/home/user/.ivy2/cache/org.aspectj/aspectjrt/jars/aspectjrt-1.6.5.jar:/home/user/.ivy2/cache/org.aspectj/aspectjtools/jars/aspectjtools-1.6.5.jar:/home/user/.ivy2/cache/org.mockito/mockito-all/jars/mockito-all-1.8.2.jar:/home/user/.ivy2/cache/com.jcraft/jsch/jars/jsch-0.1.42.jar]
>
> Total time: 6 seconds
>
> Sincerely,
>
> Daniel McEnnis.
>

Re: why local fs instead of hdfs

2011-04-14 Thread Konstantin Boudnik

Seems like something is setting fs.default.name programmatically.
Another possibility that $HADOOP_CONF_DIR isn't in the classpath in
the second case.

Hope it helps,
  Cos

On Thu, Apr 14, 2011 at 20:24, Gang Luo  wrote:
> Hi all,
>
> a tricky problem here. When we prepare an input path, it should be a path on
> HDFS by default, right? In what condition will this become a path on local 
> file
> system? I follow a program which worked well and the input path is something
> like "hdfs://...". But when I apply the similar driver class to run different
> program, the input path become "file:/..." and doesn't work. What is the
> problem?
>
> Thanks.
>
> -Gang
>

Re: Why is JUnit a compile scope dependency?

2011-04-29 Thread Konstantin Boudnik

Yes, this seems to be a dependency declaration bug. Not a big deal, but still.
Do you care to open a JIRA under https://issues.apache.org/jira/browse/HADOOP

Thanks,
   Cos

On Fri, Apr 29, 2011 at 07:03, Juan P.  wrote:
> I was putting together a maven project and imported hadoop-core as a
> dependency and noticed that among the jars it brought with it was JUnit 4.5.
> Shouldn't it be a test scope dependency? It also happens with JUnit 3.8.1
> for the commons-httpclient-3.0.1 dependency it pulls down from the repo.
>
> Cheers,
> Juan
>

Re: TestDFSIO Bechmark

2011-05-03 Thread Konstantin Boudnik

[taking common-@ and hdfs-@ lists to Bcc:]
Please do not cross-post.On Tue, May 3, 2011 at 03:26, baran cakici
 wrote:
> Hi,
>
> I want to know I/O Performance of my Hadoop Cluster. Because of that I ran
> test.jar, hier is my Results;
>
> - TestDFSIO - : write
> Date & time: Mon May 02 14:38:29 CEST 2011
> Number of files: 10
> Total MBytes processed: 1
> Throughput mb/sec: 12.809033955468113
> Average IO rate mb/sec: 13.16771411895752
> IO rate std deviation: 2.059995952372142
> Test exec time sec: 964.4
>
> - TestDFSIO - : read
> Date & time: Mon May 02 15:00:10 CEST 2011
> Number of files: 10
> Total MBytes processed: 1
> Throughput mb/sec: 12.938200684430816
> Average IO rate mb/sec: 14.908231735229492
> IO rate std deviation: 4.231364683566859
> Test exec time sec: 941.601
>
> read-speed and write-speed almost same!!! Why can't I read faster?
>
> thanks,
>
> Baran

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Konstantin Boudnik

Also, it seems like Ganglia would be very well complemented by Nagios
to allow you to monitor an overall health of your cluster.
--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.

On Tue, May 17, 2011 at 15:15, Allen Wittenauer  wrote:
>
> On May 17, 2011, at 3:11 PM, Mark question wrote:
>
>> So what other memory consumption tools do you suggest? I don't want to do it
>> manually and dump statistics into file because IO will affect performance
>> too.
>
>        We watch memory with Ganglia.  We also tune our systems such that a 
> task will only take X amount.  In other words, given an 8gb RAM:
>
>        1gb for the OS
>        1gb for the TT and DN
>        6gb for all tasks
>
>        if we assume each task will take max 1gb, then we end up with 3 maps 
> and 3 reducers.
>
>        Keep in mind that the mem consumed is more than just JVM heap size.

Re: Hadoop and WikiLeaks

2011-05-18 Thread Konstantin Boudnik

You are, perhaps, aware that now your name will be associated with
WikiLeaks too because this mailing list is archived and publicly
searchable? I think you are a hero, man!
--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.



On Wed, May 18, 2011 at 09:53, Edward Capriolo  wrote:
> http://hadoop.apache.org/#What+Is+Apache%E2%84%A2+Hadoop%E2%84%A2%3F
>
> March 2011 - Apache Hadoop takes top prize at Media Guardian Innovation
> Awards
>
> The Hadoop project won the "innovator of the year"award from the UK's
> Guardian newspaper, where it was described as "had the potential as a
> greater catalyst for innovation than other nominees including WikiLeaks and
> the iPad."
>
> Does this copy text bother anyone else? Sure winning any award is great but
> does hadoop want to be associated with "innovation" like WikiLeaks?
>
> Edward
>

Re: Hadoop and WikiLeaks

2011-05-22 Thread Konstantin Boudnik

On Sun, May 22, 2011 at 15:30, Edward Capriolo  wrote:
but for the
> reasons I outlined above I would not want to be associated with them at all.

"I give no damn about your opinion, but I will defend your right to
express it with my blood..."

That said, please express such opinions not in the hadoop user list
simply because common-user@hadoop.apache.org isn't a place to debate
balances and checks.

Re: EC2 cloudera cc1.4xlarge

2011-05-24 Thread Konstantin Boudnik

Try cloudera specific lisls with your questions.
--
  Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616  6115 220F 6980 1F27 E622

Disclaimer: Opinions expressed in this email are those of the author,
and do not necessarily represent the views of any company the author
might be affiliated with at the moment of writing.



On Tue, May 24, 2011 at 16:23, Aleksandr Elbakyan  wrote:
> Hello,
>
> I am want to use cc1.4xlarge cluster for some data processing, to spin 
> clusters I am using cloudera scripts. hadoop-ec2-init-remote.sh has default 
> configuration until c1.xlarge but not configuration for cc1.4xlarge, can 
> someone give formula how does this values calculated based on hardware?
>
> C1.XLARGE
>     MAX_MAP_TASKS=8 -  mapred.tasktracker.map.tasks.maximum
>     MAX_REDUCE_TASKS=4 - mapred.tasktracker.reduce.tasks.maximum
>     CHILD_OPTS=-Xmx680m - mapred.child.java.opts
>     CHILD_ULIMIT=1392640 - mapred.child.ulimit
>
> I am guessing but I think
>
> CHILD_OPTS = (total ram on the box - 1gb) /(MAX_MAP_TASKS, MAX_REDUCE_TASKS)
>
> But not sure how to calculate rest
>
> Regards,
> Aleksandr
>
>
>

Re: Unable to start hadoop-0.20.2 but able to start hadoop-0.20.203 cluster

2011-05-27 Thread Konstantin Boudnik

On Thu, May 26, 2011 at 07:01PM, Xu, Richard  wrote:
> 2011-05-26 12:30:29,175 INFO org.apache.hadoop.ipc.Server: IPC Server handler 
> 4 on 9000, call addBlock(/tmp/hadoop-cfadm/mapred/system/jobtracker.info, 
> DFSCl
> ient_2146408809) from 169.193.181.212:55334: error: java.io.IOException: File 
> /tmp/hadoop-cfadm/mapred/system/jobtracker.info could only be replicated to 0 
> n
> odes, instead of 1
> java.io.IOException: File /tmp/hadoop-cfadm/mapred/system/jobtracker.info 
> could only be replicated to 0 nodes, instead of 1

Is your DFS up running, by any chance? 

Cos

Re: Starting JobTracker Locally but binding to remote Address

2011-05-31 Thread Konstantin Boudnik

This seems to be your problem, really...
* mapred.job.tracker*
* slave2:9001*

On Tue, May 31, 2011 at 06:07PM, Juan P. wrote:
> Hi Guys,
> I recently configured my cluster to have 2 VMs. I configured 1
> machine (slave3) to be the namenode and another to be the
> jobtracker (slave2). They both work as datanode/tasktracker as well.
> 
> Both configs have the following contents in their masters and slaves file:
> *slave2*
> *slave3*
> 
> Both machines have the following contents on their mapred-site.xml file:
> **
> **
> *
> *
> **
> *
> *
> **
> * *
> * mapred.job.tracker*
> * slave2:9001*
> * *
> **
> 
> Both machines have the following contents on their core-site.xml file:
> **
> **
> *
> *
> **
> *
> *
> **
> * *
> * fs.default.name*
> * hdfs://slave3:9000*
> * *
> **
> 
> When I log into the namenode and I run the start-all.sh script, everything
> but the jobtracker starts. In the log files I get the following exception:
> 
> */*
> *STARTUP_MSG: Starting JobTracker*
> *STARTUP_MSG:   host = slave3/10.20.11.112*
> *STARTUP_MSG:   args = []*
> *STARTUP_MSG:   version = 0.20.2*
> *STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010*
> */*
> *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: Scheduler
> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)*
> *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker:
> java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : Cannot
> assign requested address*
> *at org.apache.hadoop.ipc.Server.bind(Server.java:190)*
> *at org.apache.hadoop.ipc.Server$Listener.(Server.java:253)*
> *at org.apache.hadoop.ipc.Server.(Server.java:1026)*
> *at org.apache.hadoop.ipc.RPC$Server.(RPC.java:488)*
> *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)*
> *at org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1595)
> *
> *at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)*
> *at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)*
> *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)*
> *Caused by: java.net.BindException: Cannot assign requested address*
> *at sun.nio.ch.Net.bind(Native Method)*
> *at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)*
> *at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> *
> *at org.apache.hadoop.ipc.Server.bind(Server.java:188)*
> *... 8 more*
> *
> *
> *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker:
> SHUTDOWN_MSG:*
> */*
> *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112*
> */*
> 
> 
> As I see it, from the lines
> 
> *STARTUP_MSG: Starting JobTracker*
> *STARTUP_MSG:   host = slave3/10.20.11.112*
> 
> the namenode (slave3) is trying to run the jobtracker locally but when it
> starts the jobtracker server it binds it to the slave2 address and of course
> fails:
> 
> *Problem binding to slave2/10.20.11.166:9001*
> 
> What do you guys think could be going wrong?
> 
> Thanks!
> Pony

Re: Starting JobTracker Locally but binding to remote Address

2011-05-31 Thread Konstantin Boudnik

On Tue, May 31, 2011 at 06:21PM, gordoslocos wrote:
> Eeeeh why? Isnt That the config for the jobtracker? Slave2 has been 
> defined in my /etc/hosts files.
> Should those lines not be in both nodes?

Indeed, but you are running MR start script on slave3 meaning that JT will be
started on slave3 whatever the configuration says: start-mapred.sh isn't that
smart and doesn't check your configs.

Cos

> Thanks for helping!
> Pony
> 
> On 31/05/2011, at 18:12, Konstantin Boudnik  wrote:
> 
> > This seems to be your problem, really...
> > * mapred.job.tracker*
> > * slave2:9001*
> > 
> > On Tue, May 31, 2011 at 06:07PM, Juan P. wrote:
> >> Hi Guys,
> >> I recently configured my cluster to have 2 VMs. I configured 1
> >> machine (slave3) to be the namenode and another to be the
> >> jobtracker (slave2). They both work as datanode/tasktracker as well.
> >> 
> >> Both configs have the following contents in their masters and slaves file:
> >> *slave2*
> >> *slave3*
> >> 
> >> Both machines have the following contents on their mapred-site.xml file:
> >> **
> >> **
> >> *
> >> *
> >> **
> >> *
> >> *
> >> **
> >> * *
> >> * mapred.job.tracker*
> >> * slave2:9001*
> >> * *
> >> **
> >> 
> >> Both machines have the following contents on their core-site.xml file:
> >> **
> >> **
> >> *
> >> *
> >> **
> >> *
> >> *
> >> **
> >> * *
> >> * fs.default.name*
> >> * hdfs://slave3:9000*
> >> * *
> >> **
> >> 
> >> When I log into the namenode and I run the start-all.sh script, everything
> >> but the jobtracker starts. In the log files I get the following exception:
> >> 
> >> */*
> >> *STARTUP_MSG: Starting JobTracker*
> >> *STARTUP_MSG:   host = slave3/10.20.11.112*
> >> *STARTUP_MSG:   args = []*
> >> *STARTUP_MSG:   version = 0.20.2*
> >> *STARTUP_MSG:   build =
> >> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> >> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010*
> >> */*
> >> *2011-05-31 13:54:06,940 INFO org.apache.hadoop.mapred.JobTracker: 
> >> Scheduler
> >> configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
> >> limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)*
> >> *2011-05-31 13:54:07,086 FATAL org.apache.hadoop.mapred.JobTracker:
> >> java.net.BindException: Problem binding to slave2/10.20.11.166:9001 : 
> >> Cannot
> >> assign requested address*
> >> *at org.apache.hadoop.ipc.Server.bind(Server.java:190)*
> >> *at org.apache.hadoop.ipc.Server$Listener.(Server.java:253)*
> >> *at org.apache.hadoop.ipc.Server.(Server.java:1026)*
> >> *at org.apache.hadoop.ipc.RPC$Server.(RPC.java:488)*
> >> *at org.apache.hadoop.ipc.RPC.getServer(RPC.java:450)*
> >> *at 
> >> org.apache.hadoop.mapred.JobTracker.(JobTracker.java:1595)
> >> *
> >> *at
> >> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:183)*
> >> *at
> >> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:175)*
> >> *at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:3702)*
> >> *Caused by: java.net.BindException: Cannot assign requested address*
> >> *at sun.nio.ch.Net.bind(Native Method)*
> >> *at
> >> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)*
> >> *at 
> >> sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> >> *
> >> *at org.apache.hadoop.ipc.Server.bind(Server.java:188)*
> >> *... 8 more*
> >> *
> >> *
> >> *2011-05-31 13:54:07,096 INFO org.apache.hadoop.mapred.JobTracker:
> >> SHUTDOWN_MSG:*
> >> */*
> >> *SHUTDOWN_MSG: Shutting down JobTracker at slave3/10.20.11.112*
> >> */*
> >> 
> >> 
> >> As I see it, from the lines
> >> 
> >> *STARTUP_MSG: Starting JobTracker*
> >> *STARTUP_MSG:   host = slave3/10.20.11.112*
> >> 
> >> the namenode (slave3) is trying to run the jobtracker locally but when it
> >> starts the jobtracker server it binds it to the slave2 address and of 
> >> course
> >> fails:
> >> 
> >> *Problem binding to slave2/10.20.11.166:9001*
> >> 
> >> What do you guys think could be going wrong?
> >> 
> >> Thanks!
> >> Pony

Re: About a question of Hadoop

2011-07-01 Thread Konstantin Boudnik

[addressing to common-users@]

this target is there to actually kick-off tests execution. Once you have
instrumented cluster bits are deployed you can start system tests by the
command you've mentioned.

Basically this is exactly what Wiki page is saying, I guess.

Cos

On Thu, Jun 30, 2011 at 05:27PM, 王栓奇 wrote:
>Hi, Konstantin:
> 
> I am using Herriot of Hadoop 0.21. However, I meet a problem. What does
>"test-system" mean in the following ant command
> 
>  ant test-system -Dhadoop.conf.dir.deployed=${HADOOP_CONF_DIR}
> 
>from webpage: http://wiki.apache.org/hadoop/HowToUseSystemTestFramework
> 
>Thank you very much!
> 
>Shuanqi Wang
>Beijing, China

Re: Am i crazy? - question about hadoop streaming

2011-09-14 Thread Konstantin Boudnik

I am sure if you ask at provider's specific list you'll get a better answer
than from common Hadoop list ;)

Cos

On Wed, Sep 14, 2011 at 09:48PM, Mark Kerzner wrote:
> Hi,
> 
> I am using the latest Cloudera distribution, and with that I am able to use
> the latest Hadoop API, which I believe is 0.21, for such things as
> 
> import org.apache.hadoop.mapreduce.Reducer;
> 
> So I am using mapreduce, not mapred, and everything works fine.
> 
> However, in a small streaming job, trying it out with Java classes first, I
> get this error
> 
> Exception in thread "main" java.lang.RuntimeException: class mypackage.Map
> not org.apache.hadoop.mapred.Mapper -- which it really is not, it is a
> mapreduce.Mapper.
> 
> So it seems that Cloudera backports some of the advances but for streaming
> it is still the old API.
> 
> So it is me or the world?
> 
> Thank you,
> Mark


signature.asc
Description: Digital signature

71 matches

Mail list logo