core-u...@hadoop.apache.org

2010-01-30 Thread oppo kui
core-u...@hadoop.apache.org


Re: map(K1 key, V1 value, OutputCollector output, Reporter reporter) deprecated in 0.20.2?

2010-01-30 Thread steven zhuang
thanks you guys  :D


On Sat, Jan 30, 2010 at 2:16 AM, Jim Twensky  wrote:

> Steven,
>
> I recently had the same issues, and I found this blog post very
> helpful on migrating from 0.19 to 0.20.2
>
> http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html
>
> You can download the sample code at the end, which contains a Hadoop
> word count program written using the new API.
>
> Hope this helps.
>
> -Jim
>
> On Thu, Jan 28, 2010 at 9:43 AM, Edward Capriolo 
> wrote:
> > On Thu, Jan 28, 2010 at 8:14 AM, steven zhuang 
> wrote:
> >> hello, all,
> >>As a newbie, I have been used to the (k1,v1,k2,v2) format
> >> parameter list for map and reduce methods in mapper and reducer(as is
> >> written in many books), but after several failures, I found in 0.20+, if
> we
> >> extends from base class org.apache.hadoop.mapreduce.Mapper, the map
> should
> >> be something like this:
> >>
> >> void map(KEYIN key, VALUEIN value, Context context) throws
> >> IOException, InterruptedException
> >>   A little confusing to me.
> >>   My question is why the old fashion map interface
> deprecated?
> >> thanks!
> >>
> >>
> >> --
> >>   best wishes.
> >>steven
> >>
> >
> > Steven,
> >
> > The old map/reduce api is still available. org.apache.hadoop.mapred
> >
> >>   My question is why the old fashion map interface
> deprecated?
> >
> > Cause hadoop is like a freight train, either hop on or get out of the way
> ! :)
> > Just kidding,
> >
> > Great presentation about how to update code:
> > http://www.slideshare.net/sh1mmer/upgrading-to-the-new-map-reduce-api
> >
> > Some information on the 'why'
> >
> http://www.cloudera.com/blog/2009/05/07/what%E2%80%99s-new-in-hadoop-core-020/
> >
>



-- 
   best wishes.
steven


Re: Could not obtain block

2010-01-30 Thread MilleBii
Ken,

FIXED !!! SO MUCH THANKS

Command prompt ulimit  wasn't enough, one needs to hard set it and reboot
explained here
http://posidev.com/blog/2009/06/04/set-ulimit-parameters-on-ubuntu/




2010/1/30 MilleBii 

> Increased the "ulimit" to 64000 ... same problem
> stop/start-all ... same problem but on a different block which of course
> present, so it looks like there is nothing wrong with actual data in the
> hdfs.
>
> I use the Nutch default hadoop 0.19.x anything related ?
>
> 2010/1/30 Ken Goodhope 
>
> "Could not obtain block" errors are often caused by running out of
>> available
>> file handles.  You can confirm this by going to the shell and entering
>> "ulimit -n".  If it says 1024, the default, then you will want to increase
>> it to about 64,000.
>>
>> On Fri, Jan 29, 2010 at 4:06 PM, MilleBii  wrote:
>>
>> > X-POST with Nutch mailing list.
>> >
>> > HEEELP !!!
>> >
>> > Kind of get stuck on this one.
>> > I backed-up my hdfs data, reformated the hdfs, put data back, try to
>> merge
>> > my segments together and it explodes again.
>> >
>> > Exception in thread "Lucene Merge Thread #0"
>> > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
>> > Could not obtain block: blk_4670839132945043210_1585
>> >
>> file=/user/nutch/crawl/indexed-segments/20100113003609/part-0/_ym.frq
>> >at
>> >
>> >
>> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309)
>> >
>> > If I go into the hfds/data directory I DO find the faulty block 
>> > Could it be a synchro problem on the segment merger code ?
>> >
>> > 2010/1/29 MilleBii 
>> >
>> > > I'm looking for some help. I'm Nutch user, everything was working
>> fine,
>> > but
>> > > now I get the following error when indexing.
>> > > I have a single note pseudo distributed set up.
>> > > Some people on the Nutch list indicated to me that I could full, so I
>> > > remove many things and hdfs is far from full.
>> > > This file & directory was perfectly OK the day before.
>> > > I did a "hadoop fsck"... report says healthy.
>> > >
>> > > What can I do ?
>> > >
>> > > Is is safe to do a Linux FSCK just in case ?
>> > >
>> > > Caused by: java.io.IOException: Could not obtain block:
>> > > blk_8851198258748412820_9031
>> > >
>> >
>> file=/user/nutch/crawl/indexed-segments/20100111233601/part-0/_103.frq
>> > >
>> > >
>> > > --
>> > > -MilleBii-
>> > >
>> >
>> >
>> >
>> > --
>> > -MilleBii-
>> >
>>
>>
>>
>> --
>> Ken Goodhope
>> Cell: 425-750-5616
>>
>> 362 Bellevue Way NE Apt N415
>> Bellevue WA, 98004
>>
>
>
>
> --
> -MilleBii-
>



-- 
-MilleBii-


hadoop under cygwin issue

2010-01-30 Thread Brian Wolf


Hi,

I am trying to run Hadoop 0.19.2 under cygwin as per directions on the 
hadoop "quickstart" web page.


I know sshd is running and I can "ssh localhost" without a password.

This is from my hadoop-site.xml



hadoop.tmp.dir
/cygwin/tmp/hadoop-${user.name}


fs.default.name
hdfs://localhost:9000


mapred.job.tracker
localhost:9001


mapred.job.reuse.jvm.num.tasks
-1


dfs.replication
1


dfs.permissions
false


webinterface.private.actions
true



These are errors from my log files:


2010-01-30 00:03:33,091 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: 
Initializing RPC Metrics with hostName=NameNode, port=9000
2010-01-30 00:03:33,121 INFO 
org.apache.hadoop.hdfs.server.namenode.NameNode: Namenode up at: 
localhost/127.0.0.1:9000
2010-01-30 00:03:33,161 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-01-30 00:03:33,181 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics: 
Initializing NameNodeMeterics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2010-01-30 00:03:34,603 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
fsOwner=brian,None,Administrators,Users
2010-01-30 00:03:34,603 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-01-30 00:03:34,603 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: 
isPermissionEnabled=false
2010-01-30 00:03:34,653 INFO 
org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics: 
Initializing FSNamesystemMetrics using context 
object:org.apache.hadoop.metrics.spi.NullContext
2010-01-30 00:03:34,653 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered 
FSNamesystemStatusMBean
2010-01-30 00:03:34,803 INFO 
org.apache.hadoop.hdfs.server.common.Storage: Storage directory 
C:\cygwin\tmp\hadoop-brian\dfs\name does not exist.
2010-01-30 00:03:34,813 ERROR 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem 
initialization failed.
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: 
Directory C:\cygwin\tmp\hadoop-brian\dfs\name is in an inconsistent 
state: storage directory does not exist or is not accessible.
   at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:278)
   at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:87)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:309)
   at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.(FSNamesystem.java:288)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:163)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:208)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:194)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:859)
   at 
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:868)
2010-01-30 00:03:34,823 INFO org.apache.hadoop.ipc.Server: Stopping 
server on 9000






=

2010-01-29 15:13:30,270 INFO org.apache.hadoop.ipc.Client: Retrying 
connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).

problem cleaning system directory: null
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on 
connection exception: java.net.ConnectException: Connection refused: no 
further information

   at org.apache.hadoop.ipc.Client.wrapException(Client.java:724)
   at org.apache.hadoop.ipc.Client.call(Client.java:700)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
   at $Proxy4.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:348)
   at 
org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:104)




Thanks
Brian



Re: Could not obtain block

2010-01-30 Thread MilleBii
Increased the "ulimit" to 64000 ... same problem
stop/start-all ... same problem but on a different block which of course
present, so it looks like there is nothing wrong with actual data in the
hdfs.

I use the Nutch default hadoop 0.19.x anything related ?

2010/1/30 Ken Goodhope 

> "Could not obtain block" errors are often caused by running out of
> available
> file handles.  You can confirm this by going to the shell and entering
> "ulimit -n".  If it says 1024, the default, then you will want to increase
> it to about 64,000.
>
> On Fri, Jan 29, 2010 at 4:06 PM, MilleBii  wrote:
>
> > X-POST with Nutch mailing list.
> >
> > HEEELP !!!
> >
> > Kind of get stuck on this one.
> > I backed-up my hdfs data, reformated the hdfs, put data back, try to
> merge
> > my segments together and it explodes again.
> >
> > Exception in thread "Lucene Merge Thread #0"
> > org.apache.lucene.index.MergePolicy$MergeException: java.io.IOException:
> > Could not obtain block: blk_4670839132945043210_1585
> > file=/user/nutch/crawl/indexed-segments/20100113003609/part-0/_ym.frq
> >at
> >
> >
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:309)
> >
> > If I go into the hfds/data directory I DO find the faulty block 
> > Could it be a synchro problem on the segment merger code ?
> >
> > 2010/1/29 MilleBii 
> >
> > > I'm looking for some help. I'm Nutch user, everything was working fine,
> > but
> > > now I get the following error when indexing.
> > > I have a single note pseudo distributed set up.
> > > Some people on the Nutch list indicated to me that I could full, so I
> > > remove many things and hdfs is far from full.
> > > This file & directory was perfectly OK the day before.
> > > I did a "hadoop fsck"... report says healthy.
> > >
> > > What can I do ?
> > >
> > > Is is safe to do a Linux FSCK just in case ?
> > >
> > > Caused by: java.io.IOException: Could not obtain block:
> > > blk_8851198258748412820_9031
> > >
> >
> file=/user/nutch/crawl/indexed-segments/20100111233601/part-0/_103.frq
> > >
> > >
> > > --
> > > -MilleBii-
> > >
> >
> >
> >
> > --
> > -MilleBii-
> >
>
>
>
> --
> Ken Goodhope
> Cell: 425-750-5616
>
> 362 Bellevue Way NE Apt N415
> Bellevue WA, 98004
>



-- 
-MilleBii-