Re: PathFilter File Glob

2012-02-28 Thread Idris Ali
Hi,

Why not just use:
FileSystem fs = FileSystem.get(conf);
FileStatus[] files = fs.globStatus(new Path(path+filter));

Thanks,
-Idris

On Mon, Feb 27, 2012 at 1:06 PM, Harsh J ha...@cloudera.com wrote:

 Hi Simon,

 You need to implement your custom PathFilter derivative class, and
 then set it via your {File}InputFormat class using setInputPathFilter:


 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#setInputPathFilter(org.apache.hadoop.mapred.JobConf,%20java.lang.Class)http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/FileInputFormat.html#setInputPathFilter%28org.apache.hadoop.mapred.JobConf,%20java.lang.Class%29

 (TextInputFormat is a derivative of FileInputFormat, and hence has the
 same method.)

 HTH.

 2012/2/23 Heeg, Simon s.h...@telekom.de:
  Hello,
 
  I would like to use a PathFilter for filtering the files with a regular
 expression which are read by the TextInputFormat, but I don't know how to
 apply the filter. I cannot find a setter. Unfortunately google was not my
 friend with this issue and The definitive Guide does  not help that much.
  I am using Hadoop 0.20.2-cdh3u3.
 

 --
 Harsh J



Re: reduce no response

2012-02-01 Thread Idris Ali
Hi Jinyan,

To quickly check if it has to do with resolving the ip address, can you
check what is the hostname of the system?
A quick hack would be to rename host itself as localhost if you are using
cloudera's pseudo cluster hadoop for testing.

Thanks,
-Idris

On Wed, Feb 1, 2012 at 4:24 PM, Harsh J ha...@cloudera.com wrote:

 Jinyan,

 Am not sure what your problem here seems to be - the client hanging or
 the job itself hanging. Could you provide us some more information on
 what state the job is hung in, or expand on the job client hang?
 Having a jstack also helps whenever you run into a JVM hang.

 On Wed, Feb 1, 2012 at 3:46 PM, Jinyan Xu jinyan...@exar.com wrote:
  Hi all,
 
  Do terasort use 4 reduce 8 map , when all maps finish reduce no response.
 
  Debug messages like:
 
  12/02/02 01:42:47 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7126
  12/02/02 01:42:47 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7126
  12/02/02 01:42:47 DEBUG ipc.RPC: Call: getJobStatus 1
  12/02/02 01:42:48 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7127
  12/02/02 01:42:48 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7127
  12/02/02 01:42:48 DEBUG ipc.RPC: Call: getTaskCompletionEvents 0
  12/02/02 01:42:48 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7128
  12/02/02 01:42:48 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7128
  12/02/02 01:42:48 DEBUG ipc.RPC: Call: getJobStatus 1
  12/02/02 01:42:49 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7129
  12/02/02 01:42:49 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7129
  12/02/02 01:42:49 DEBUG ipc.RPC: Call: getTaskCompletionEvents 0
  12/02/02 01:42:49 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7130
  12/02/02 01:42:49 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7130
  12/02/02 01:42:49 DEBUG ipc.RPC: Call: getJobStatus 1
  12/02/02 01:42:50 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7131
  12/02/02 01:42:50 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7131
  12/02/02 01:42:50 DEBUG ipc.RPC: Call: getTaskCompletionEvents 1
  12/02/02 01:42:50 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7132
  12/02/02 01:42:50 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7132
  12/02/02 01:42:50 DEBUG ipc.RPC: Call: getJobStatus 0
  12/02/02 01:42:51 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop sending #7133
  12/02/02 01:42:51 DEBUG ipc.Client: IPC Client (47) connection to
 localhost/127.0.0.1:9001 from hadoop got value #7133
 
 
  
  The information and any attached documents contained in this message
  may be confidential and/or legally privileged. The message is
  intended solely for the addressee(s). If you are not the intended
  recipient, you are hereby notified that any use, dissemination, or
  reproduction is strictly prohibited and may be unlawful. If you are
  not the intended recipient, please contact the sender immediately by
  return e-mail and destroy all copies of the original message.



 --
 Harsh J
 Customer Ops. Engineer
 Cloudera | http://tiny.cloudera.com/about



Re: Too many open files Error

2012-01-26 Thread Idris Ali
Hi Mark,

As Harsh pointed out it is not good idea to increase the Xceiver count to
arbitrarily higher value, I suggested to increase the xceiver count just to
unblock execution of your program temporarily.

Thanks,
-Idris

On Fri, Jan 27, 2012 at 10:39 AM, Harsh J ha...@cloudera.com wrote:

 You are technically allowing DN to run 1 million block transfer
 (in/out) threads by doing that. It does not take up resources by
 default sure, but now it can be abused with requests to make your DN
 run out of memory and crash cause its not bound to proper limits now.

 On Fri, Jan 27, 2012 at 5:49 AM, Mark question markq2...@gmail.com
 wrote:
  Harsh, could you explain briefly why is 1M setting for xceiver is bad?
 the
  job is working now ...
  about the ulimit -u it shows  200703, so is that why connection is reset
 by
  peer? How come it's working with the xceiver modification?
 
  Thanks,
  Mark
 
 
  On Thu, Jan 26, 2012 at 12:21 PM, Harsh J ha...@cloudera.com wrote:
 
  Agree with Raj V here - Your problem should not be the # of transfer
  threads nor the number of open files given that stacktrace.
 
  And the values you've set for the transfer threads are far beyond
  recommendations of 4k/8k - I would not recommend doing that. Default
  in 1.0.0 is 256 but set it to 2048/4096, which are good value to have
  when noticing increased HDFS load, or when running services like
  HBase.
 
  You should instead focus on why its this particular job (or even
  particular task, which is important to notice) that fails, and not
  other jobs (or other task attempts).
 
  On Fri, Jan 27, 2012 at 1:10 AM, Raj V rajv...@yahoo.com wrote:
   Mark
  
   You have this Connection reset by peer. Why do you think this
 problem
  is related to too many open files?
  
   Raj
  
  
  
  
   From: Mark question markq2...@gmail.com
  To: common-user@hadoop.apache.org
  Sent: Thursday, January 26, 2012 11:10 AM
  Subject: Re: Too many open files Error
  
  Hi again,
  I've tried :
   property
  namedfs.datanode.max.xcievers/name
  value1048576/value
/property
  but I'm still getting the same error ... how high can I go??
  
  Thanks,
  Mark
  
  
  
  On Thu, Jan 26, 2012 at 9:29 AM, Mark question markq2...@gmail.com
  wrote:
  
   Thanks for the reply I have nothing about
  dfs.datanode.max.xceivers on
   my hdfs-site.xml so hopefully this would solve the problem and about
  the
   ulimit -n , I'm running on an NFS cluster, so usually I just start
  Hadoop
   with a single bin/start-all.sh ... Do you think I can add it by
   bin/Datanode -ulimit n ?
  
   Mark
  
  
   On Thu, Jan 26, 2012 at 7:33 AM, Mapred Learn 
 mapred.le...@gmail.com
  wrote:
  
   U need to set ulimit -n bigger value on datanode and restart
  datanodes.
  
   Sent from my iPhone
  
   On Jan 26, 2012, at 6:06 AM, Idris Ali psychid...@gmail.com
 wrote:
  
Hi Mark,
   
On a lighter note what is the count of xceivers?
   dfs.datanode.max.xceivers
property in hdfs-site.xml?
   
Thanks,
-idris
   
On Thu, Jan 26, 2012 at 5:28 PM, Michel Segel 
   michael_se...@hotmail.comwrote:
   
Sorry going from memory...
As user Hadoop or mapred or hdfs what do you see when you do a
  ulimit
   -a?
That should give you the number of open files allowed by a
 single
   user...
   
   
Sent from a remote device. Please excuse any typos...
   
Mike Segel
   
On Jan 26, 2012, at 5:13 AM, Mark question markq2...@gmail.com
 
   wrote:
   
Hi guys,
   
 I get this error from a job trying to process 3Million
 records.
   
java.io.IOException: Bad connect ack with firstBadLink
192.168.1.20:50010
  at
   
   
  
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2903)
  at
   
   
  
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2826)
  at
   
   
  
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2102)
  at
   
   
  
 
 org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2288)
   
When I checked the logfile of the datanode-20, I see :
   
2012-01-26 03:00:11,827 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode:
  DatanodeRegistration(
192.168.1.20:50010,
storageID=DS-97608578-192.168.1.20-50010-1327575205369,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.IOException: Connection reset by peer
  at sun.nio.ch.FileDispatcher.read0(Native Method)
  at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
  at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
  at sun.nio.ch.IOUtil.read(IOUtil.java:175)
  at
 sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
  at
   
   
  
 
 org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
  at
   
   
  
 
 org.apache.hadoop.net.SocketIOWithTimeout.doIO

Re: Netstat Shows Port 8020 Doesn't Seem to Listen

2012-01-09 Thread Idris Ali
Hi,

Looks like problem in starting DFS and MR, can you run 'jps' and see if NN,
DN, SNN, JT and TT are running,

also make sure for pseudo-distributed mode, the following entries are
present:

1. In core-site.xml
 property
namefs.default.name/name
valuehdfs://localhost:8020/value
  /property

  property
 namehadoop.tmp.dir/name
 valueSOME TMP dir with Read/Write acces not system temp/value
  /property
  property

2.  In hdfs-site.xml
property
namedfs.replication/name
value1/value
  /property
  property
 namedfs.permissions/name
 valuefalse/value
  /property
  property
 !-- specify this so that running 'hadoop namenode -format' formats
the right dir --
 namedfs.name.dir/name
 valueLocal dir with Read/Write access/value
  /property

3. In mapred-stie.xml
  property
namemapred.job.tracker/name
valuelocalhost:8021/value
  /property

Thanks,
-Idris

On Tue, Jan 10, 2012 at 1:07 AM, Eli Finkelshteyn iefin...@gmail.comwrote:

 Positive. Like I said before, netstat -a | grep 8020 gives me nothing.
 Even if the firewall was the problem, that should still give me output that
 the port is listening, but I'd just be unable to hit it from an outside box
 (I tested this by blocking port 50070, at which point it still showed up in
 netstat -a, but was inaccessible through http from a remote machine). This
 problem is something else.


 On 1/9/12 2:31 PM, zGreenfelder wrote:

 On Mon, Jan 9, 2012 at 1:58 PM, Eli 
 Finkelshteyniefinkel@gmail.**comiefin...@gmail.com
  wrote:

 More info:

 In the DataNode log, I'm also seeing:

 2012-01-09 13:06:27,751 INFO org.apache.hadoop.ipc.Client: Retrying
 connect
 to server: localhost/127.0.0.1:8020. Already tried 9 time(s).

 Why would things just not load on port 8020? I feel like all the errors
 I'm
 seeing are caused by this, but I can't see any errors about why this
 occurred in the first place.

  are you sure there isn't a firewall in place blocking port 8020?
 e.g. iptables on the local machines?   if you do
 telnet localhost 8020
 do you make a connection? if you use lsof and/or netstat can you see
 the port open?
 if you have root access you can try turning off the firewall with
 iptables -F to see if things work without firewall rules.





Hadoop startup error - Mac OS - JDK

2011-12-16 Thread Idris Ali
Hi,

I am getting the below error since my last java update, I am using Mac OS
10.7.2 and CDH3u0.
I tried with open JDK 1.7 and sun JDK 1.6._29. I use to run oozie with
Hadoop till the last update.
Any help is appreciated. I can send the details of core-site.xml as well.

Thanks,
-Idris

2011-12-16 20:07:50.009 java[92609:1d03] Unable to load realm mapping info
from SCDynamicStore
2011-12-16 20:08:09.245 java[92609:1107] Unable to load realm mapping info
from SCDynamicStore
2011-12-16 20:08:09.246 java[92609:1107] *** Terminating app due to
uncaught exception 'JavaNativeException', reason: 'KrbException: Could not
load configuration from SCDynamicStore'
*** First throw call stack:
(
0   CoreFoundation  0x7fff98147286
__exceptionPreprocess + 198
1   libobjc.A.dylib 0x7fff98519d5e
objc_exception_throw + 43
2   CoreFoundation  0x7fff981d14c9
-[NSException raise] + 9
3   JavaNativeFoundation0x000106dacc47
JNFCallStaticVoidMethod + 213
4   libosx.dylib0x000107b0c184
___SCDynamicStoreCallBack_block_invoke_1 + 24
5   JavaNativeFoundation0x000106daf18a
JNFPerformEnvBlock + 86
6   SystemConfiguration 0x7fff8e4d1d50
rlsPerform + 119
7   CoreFoundation  0x7fff980b5b51
__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
8   CoreFoundation  0x7fff980b53bd
__CFRunLoopDoSources0 + 253
9   CoreFoundation  0x7fff980dc1a9
__CFRunLoopRun + 905
10  CoreFoundation  0x7fff980dbae6
CFRunLoopRunSpecific + 230
11  java0x000101d83eb1
CreateExecutionEnvironment + 841
12  java0x000101d7fecd
JLI_Launch + 1933
13  java0x000101d85c2d main +
108
14  java0x000101d7f738 start +
52
15  ??? 0x0014 0x0 + 20
)
terminate called throwing an exception


Re: Hadoop startup error - Mac OS - JDK

2011-12-16 Thread Idris Ali
Thanks Uma,

I have been using that parameter to avoid SCDyanmic.. error, and things
were fine till 1.6.0_26 sun jdk update.

But with the new update nothing seems to work.

-Idris

On Fri, Dec 16, 2011 at 8:24 PM, Uma Maheswara Rao G
mahesw...@huawei.comwrote:

 Some workaround available in
 https://issues.apache.org/jira/browse/HADOOP-7489
 try adding that options in hadoop-env.sh.

 Regards,
 Uma
 
 From: Idris Ali [psychid...@gmail.com]
 Sent: Friday, December 16, 2011 8:16 PM
 To: common-user@hadoop.apache.org; oozie-us...@incubator.apache.org
 Subject: Hadoop startup error - Mac OS - JDK

 Hi,

 I am getting the below error since my last java update, I am using Mac OS
 10.7.2 and CDH3u0.
 I tried with open JDK 1.7 and sun JDK 1.6._29. I use to run oozie with
 Hadoop till the last update.
 Any help is appreciated. I can send the details of core-site.xml as well.

 Thanks,
 -Idris

 2011-12-16 20:07:50.009 java[92609:1d03] Unable to load realm mapping info
 from SCDynamicStore
 2011-12-16 20:08:09.245 java[92609:1107] Unable to load realm mapping info
 from SCDynamicStore
 2011-12-16 20:08:09.246 java[92609:1107] *** Terminating app due to
 uncaught exception 'JavaNativeException', reason: 'KrbException: Could not
 load configuration from SCDynamicStore'
 *** First throw call stack:
 (
0   CoreFoundation  0x7fff98147286
 __exceptionPreprocess + 198
1   libobjc.A.dylib 0x7fff98519d5e
 objc_exception_throw + 43
2   CoreFoundation  0x7fff981d14c9
 -[NSException raise] + 9
3   JavaNativeFoundation0x000106dacc47
 JNFCallStaticVoidMethod + 213
4   libosx.dylib0x000107b0c184
 ___SCDynamicStoreCallBack_block_invoke_1 + 24
5   JavaNativeFoundation0x000106daf18a
 JNFPerformEnvBlock + 86
6   SystemConfiguration 0x7fff8e4d1d50
 rlsPerform + 119
7   CoreFoundation  0x7fff980b5b51
 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
8   CoreFoundation  0x7fff980b53bd
 __CFRunLoopDoSources0 + 253
9   CoreFoundation  0x7fff980dc1a9
 __CFRunLoopRun + 905
10  CoreFoundation  0x7fff980dbae6
 CFRunLoopRunSpecific + 230
11  java0x000101d83eb1
 CreateExecutionEnvironment + 841
12  java0x000101d7fecd
 JLI_Launch + 1933
13  java0x000101d85c2d main +
 108
14  java0x000101d7f738 start +
 52
15  ??? 0x0014 0x0 + 20
 )
 terminate called throwing an exception



Re: Hadoop Comic

2011-12-07 Thread Idris Ali
Hi Maneesh,

Great work, without your comic strip, I took 1 week, read couple of
articles on hadoop and gone through hadoop definite guide to understand
these concepts.

Cheers!
-Idris

On Thu, Dec 8, 2011 at 12:32 AM, maneesh varshney mvarsh...@gmail.comwrote:

 Hi Jaganadh

 I am the author of this comic strip. Please feel free to re-distribute it
 as you see fit.. I assign the content under Creative Commons Attribution
 Share-Alike.

 Would also like to thank everybody for the nice feedback and encouragement!
 This was a little experiment on my part to see if visual format can be used
 to explain protocols and algorithms. I am teaching myself WordPress these
 days so that I can host other comics on this and other topics in computer
 science.

 Thanks
 -Maneesh


 On Wed, Dec 7, 2011 at 2:06 AM, JAGANADH G jagana...@gmail.com wrote:

  Hi
  Is it re-distributable or sharable in a blog or like ?
 
 
  --
  **
  JAGANADH G
  http://jaganadhg.in
  *ILUGCBE*
  http://ilugcbe.psgkriya.org