Re: Best Linux Operating system used for Hadoop

2012-01-27 Thread Sujit Dhamale
Thanks a lot  Alex,
i will install Linux  RHEL today Only .

--Sujit Dhamale

On Fri, Jan 27, 2012 at 2:49 PM, alo alt  wrote:

> I suggest CentOS 5.7 / RHEL 5.7
>
> CentOS 6.2 runs also stable
>
> - Alex
>
> --
> Alexander Lorenz
> http://mapredit.blogspot.com
>
> On Jan 27, 2012, at 10:15 AM, Sujit Dhamale wrote:
>
> > Hi All,
> > I am new to Hadoop,
> > Can any one tell me which is the best Linux Operating system used for
> > installing & running Hadoop. ??
> > now a day i am using Ubuntu 11.4 and install Hadoop on it but it
> > crashes number of times .
> >
> > can some please help me out ???
> >
> >
> > Kind regards
> > Sujit Dhamale
>
>


Re: getting NullPointerException while running Word cont example

2012-03-06 Thread Sujit Dhamale
Hadoop version : hadoop-0.20.203.0rc1.tar
Operaring Syatem : ubuntu 11.10


On Wed, Mar 7, 2012 at 12:19 AM, Harsh J  wrote:

> Hi Sujit,
>
> Please also tell us which version/distribution of Hadoop is this?
>
> On Tue, Mar 6, 2012 at 11:27 PM, Sujit Dhamale 
> wrote:
> > Hi,
> >
> > I am new to Hadoop., i install Hadoop as per
> >
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
> <
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluste
> >
> >
> >
> > while running Word cont example i am getting *NullPointerException
> >
> > *can some one please look in to this issue ?*
> >
> > *Thanks in Advance*  !!!
> >
> > *
> >
> >
> > duser@sujit:~/Desktop/hadoop$ bin/hadoop dfs -ls /user/hduser/data
> > Found 3 items
> > -rw-r--r--   1 hduser supergroup 674566 2012-03-06 23:04
> > /user/hduser/data/pg20417.txt
> > -rw-r--r--   1 hduser supergroup1573150 2012-03-06 23:04
> > /user/hduser/data/pg4300.txt
> > -rw-r--r--   1 hduser supergroup1423801 2012-03-06 23:04
> > /user/hduser/data/pg5000.txt
> >
> > hduser@sujit:~/Desktop/hadoop$ bin/hadoop jar hadoop*examples*.jar
> > wordcount /user/hduser/data /user/hduser/gutenberg-outputd
> >
> > 12/03/06 23:14:33 INFO input.FileInputFormat: Total input paths to
> process
> > : 3
> > 12/03/06 23:14:33 INFO mapred.JobClient: Running job:
> job_201203062221_0002
> > 12/03/06 23:14:34 INFO mapred.JobClient:  map 0% reduce 0%
> > 12/03/06 23:14:49 INFO mapred.JobClient:  map 66% reduce 0%
> > 12/03/06 23:14:55 INFO mapred.JobClient:  map 100% reduce 0%
> > 12/03/06 23:14:58 INFO mapred.JobClient: Task Id :
> > attempt_201203062221_0002_r_00_0, Status : FAILED
> > Error: java.lang.NullPointerException
> >at
> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
> >
> > 12/03/06 23:15:07 INFO mapred.JobClient: Task Id :
> > attempt_201203062221_0002_r_00_1, Status : FAILED
> > Error: java.lang.NullPointerException
> >at
> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
> >
> > 12/03/06 23:15:16 INFO mapred.JobClient: Task Id :
> > attempt_201203062221_0002_r_00_2, Status : FAILED
> > Error: java.lang.NullPointerException
> >at
> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
> >at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
> >
> > 12/03/06 23:15:31 INFO mapred.JobClient: Job complete:
> job_201203062221_0002
> > 12/03/06 23:15:31 INFO mapred.JobClient: Counters: 20
> > 12/03/06 23:15:31 INFO mapred.JobClient:   Job Counters
> > 12/03/06 23:15:31 INFO mapred.JobClient: Launched reduce tasks=4
> > 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22084
> > 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
> > reduces waiting after reserving slots (ms)=0
> > 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all maps
> > waiting after reserving slots (ms)=0
> > 12/03/06 23:15:31 INFO mapred.JobClient: Launched map tasks=3
> > 12/03/06 23:15:31 INFO mapred.JobClient: Data-local map tasks=3
> > 12/03/06 23:15:31 INFO mapred.JobClient: Failed reduce tasks=1
> > 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=16799
> > 12/03/06 23:15:31 INFO mapred.JobClient:   FileSystemCounters
> > 12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_READ=740520
> > 12/03/06 23:15:31 INFO mapred.JobClient: HDFS_BYTES_READ=3671863
> > 12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2278287
> > 12/03/06 23:15:31 INFO mapred.JobClient:   File Input Format Counters
> > 12/03/06 23:15:31 INFO mapred.JobClient: Bytes Read=3671517
> > 12/03/06 23:15:31 INFO mapred.JobClient:   Map-Reduce Framework
> >

getting NullPointerException while running Word cont example

2012-03-07 Thread Sujit Dhamale
Hi,

I am new to Hadoop., i install Hadoop as per
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/


while running Word cont example i am getting *NullPointerException

*can some one please look in to this issue ?*

*Thanks in Advance*  !!!

*Hadoop version : hadoop-0.20.203.0rc1.tar
Operaring Syatem : ubuntu 11.10



duser@sujit:~/Desktop/hadoop$ bin/hadoop dfs -ls /user/hduser/data
Found 3 items
-rw-r--r--   1 hduser supergroup 674566 2012-03-06 23:04
/user/hduser/data/pg20417.txt
-rw-r--r--   1 hduser supergroup1573150 2012-03-06 23:04
/user/hduser/data/pg4300.txt
-rw-r--r--   1 hduser supergroup1423801 2012-03-06 23:04
/user/hduser/data/pg5000.txt

hduser@sujit:~/Desktop/hadoop$ bin/hadoop jar hadoop*examples*.jar
wordcount /user/hduser/data /user/hduser/gutenberg-outputd

12/03/06 23:14:33 INFO input.FileInputFormat: Total input paths to process
: 3
12/03/06 23:14:33 INFO mapred.JobClient: Running job: job_201203062221_0002
12/03/06 23:14:34 INFO mapred.JobClient:  map 0% reduce 0%
12/03/06 23:14:49 INFO mapred.JobClient:  map 66% reduce 0%
12/03/06 23:14:55 INFO mapred.JobClient:  map 100% reduce 0%
12/03/06 23:14:58 INFO mapred.JobClient: Task Id :
attempt_201203062221_0002_r_
00_0, Status : FAILED
Error: java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)

12/03/06 23:15:07 INFO mapred.JobClient: Task Id :
attempt_201203062221_0002_r_00_1, Status : FAILED
Error: java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)

12/03/06 23:15:16 INFO mapred.JobClient: Task Id :
attempt_201203062221_0002_r_00_2, Status : FAILED
Error: java.lang.NullPointerException
at
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)

12/03/06 23:15:31 INFO mapred.JobClient: Job complete: job_201203062221_0002
12/03/06 23:15:31 INFO mapred.JobClient: Counters: 20
12/03/06 23:15:31 INFO mapred.JobClient:   Job Counters
12/03/06 23:15:31 INFO mapred.JobClient: Launched reduce tasks=4
12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22084
12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
reduces waiting after reserving slots (ms)=0
12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all maps
waiting after reserving slots (ms)=0
12/03/06 23:15:31 INFO mapred.JobClient: Launched map tasks=3
12/03/06 23:15:31 INFO mapred.JobClient: Data-local map tasks=3
12/03/06 23:15:31 INFO mapred.JobClient: Failed reduce tasks=1
12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=16799
12/03/06 23:15:31 INFO mapred.JobClient:   FileSystemCounters
12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_READ=740520
12/03/06 23:15:31 INFO mapred.JobClient: HDFS_BYTES_READ=3671863
12/03/06 23:15:31 INFO mapred.JobClient: FILE_BYTES_WRITTEN=2278287
12/03/06 23:15:31 INFO mapred.JobClient:   File Input Format Counters
12/03/06 23:15:31 INFO mapred.JobClient: Bytes Read=3671517
12/03/06 23:15:31 INFO mapred.JobClient:   Map-Reduce Framework
12/03/06 23:15:31 INFO mapred.JobClient: Map output materialized
bytes=1474341
12/03/06 23:15:31 INFO mapred.JobClient: Combine output records=102322
12/03/06 23:15:31 INFO mapred.JobClient: Map input records=77932
12/03/06 23:15:31 INFO mapred.JobClient: Spilled Records=153640
12/03/06 23:15:31 INFO mapred.JobClient: Map output bytes=6076095
12/03/06 23:15:31 INFO mapred.JobClient: Combine input records=629172
12/03/06 23:15:31 INFO mapred.JobClient: Map output records=629172
12/03/06 23:15:31 INFO mapred.JobClient: SPLIT_RAW_BYTES=346
hduser@sujit:~/Desktop/hadoop$


Re: Image Processing in Hadoop

2012-04-02 Thread Sujit Dhamale
Shreya  can u please Explain your scenario .


On Mon, Apr 2, 2012 at 3:02 PM,  wrote:

>
>
> Hi,
>
>
>
> Can someone point me to some info on Image processing using Hadoop?
>
>
>
> Regards,
>
> Shreya
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information.
> If you are not the intended recipient, please contact the sender by reply
> e-mail and destroy all copies of the original message.
> Any unauthorized review, use, disclosure, dissemination, forwarding,
> printing or copying of this email or any action taken in reliance on this
> e-mail is strictly prohibited and may be unlawful.
>


Re: getting NullPointerException while running Word cont example

2012-04-02 Thread Sujit Dhamale
Can some one please look in to below issue ??
Thanks in Advance

On Wed, Mar 7, 2012 at 9:09 AM, Sujit Dhamale wrote:

> Hadoop version : hadoop-0.20.203.0rc1.tar
> Operaring Syatem : ubuntu 11.10
>
>
>
> On Wed, Mar 7, 2012 at 12:19 AM, Harsh J  wrote:
>
>> Hi Sujit,
>>
>> Please also tell us which version/distribution of Hadoop is this?
>>
>> On Tue, Mar 6, 2012 at 11:27 PM, Sujit Dhamale 
>> wrote:
>> > Hi,
>> >
>> > I am new to Hadoop., i install Hadoop as per
>> >
>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
>> <
>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluste
>> >
>> >
>> >
>> > while running Word cont example i am getting *NullPointerException
>> >
>> > *can some one please look in to this issue ?*
>> >
>> > *Thanks in Advance*  !!!
>> >
>> > *
>> >
>> >
>> > duser@sujit:~/Desktop/hadoop$ bin/hadoop dfs -ls /user/hduser/data
>> > Found 3 items
>> > -rw-r--r--   1 hduser supergroup 674566 2012-03-06 23:04
>> > /user/hduser/data/pg20417.txt
>> > -rw-r--r--   1 hduser supergroup1573150 2012-03-06 23:04
>> > /user/hduser/data/pg4300.txt
>> > -rw-r--r--   1 hduser supergroup1423801 2012-03-06 23:04
>> > /user/hduser/data/pg5000.txt
>> >
>> > hduser@sujit:~/Desktop/hadoop$ bin/hadoop jar hadoop*examples*.jar
>> > wordcount /user/hduser/data /user/hduser/gutenberg-outputd
>> >
>> > 12/03/06 23:14:33 INFO input.FileInputFormat: Total input paths to
>> process
>> > : 3
>> > 12/03/06 23:14:33 INFO mapred.JobClient: Running job:
>> job_201203062221_0002
>> > 12/03/06 23:14:34 INFO mapred.JobClient:  map 0% reduce 0%
>> > 12/03/06 23:14:49 INFO mapred.JobClient:  map 66% reduce 0%
>> > 12/03/06 23:14:55 INFO mapred.JobClient:  map 100% reduce 0%
>> > 12/03/06 23:14:58 INFO mapred.JobClient: Task Id :
>> > attempt_201203062221_0002_r_00_0, Status : FAILED
>> > Error: java.lang.NullPointerException
>> >at
>> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
>> >
>> > 12/03/06 23:15:07 INFO mapred.JobClient: Task Id :
>> > attempt_201203062221_0002_r_00_1, Status : FAILED
>> > Error: java.lang.NullPointerException
>> >at
>> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
>> >
>> > 12/03/06 23:15:16 INFO mapred.JobClient: Task Id :
>> > attempt_201203062221_0002_r_00_2, Status : FAILED
>> > Error: java.lang.NullPointerException
>> >at
>> > java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:768)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.getMapCompletionEvents(ReduceTask.java:2900)
>> >at
>> >
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$GetMapEventsThread.run(ReduceTask.java:2820)
>> >
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Job complete:
>> job_201203062221_0002
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Counters: 20
>> > 12/03/06 23:15:31 INFO mapred.JobClient:   Job Counters
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Launched reduce tasks=4
>> > 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=22084
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
>> > reduces waiting after reserving slots (ms)=0
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Total time spent by all
>> maps
>> > waiting after reserving slots (ms)=0
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Launched map tasks=3
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Data-local map tasks=3
>> > 12/03/06 23:15:31 INFO mapred.JobClient: Failed reduce tasks=1
>> > 12/03/06 23:15:31 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=16799
>> > 1

Data Node is not Started

2012-04-06 Thread Sujit Dhamale
Hi all,
my DataNode is not started .

even after deleting hadoop*.pid file from /tmp , But still Data node is not
started ,


Hadoop Version: hadoop-1.0.1.tar.gz
Java version : java version "1.6.0_26
Operating System : Ubuntu 11.10


i did below procedure


*hduser@sujit:~/Desktop/hadoop/bin$ jps*
11455 Jps


*hduser@sujit:~/Desktop/hadoop/bin$ start-all.sh*
Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-namenode-sujit.out
localhost: starting datanode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-datanode-sujit.out
localhost: starting secondarynamenode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-sujit.out
starting jobtracker, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-jobtracker-sujit.out
localhost: starting tasktracker, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-tasktracker-sujit.out

*hduser@sujit:~/Desktop/hadoop/bin$ jps*
11528 NameNode
12019 SecondaryNameNode
12355 TaskTracker
12115 JobTracker
12437 Jps


*hduser@sujit:~/Desktop/hadoop/bin$ stop-all.sh*
Warning: $HADOOP_HOME is deprecated.

stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: no datanode to stop
localhost: stopping secondarynamenode


*hduser@sujit:~/Desktop/hadoop/bin$ jps*
13127 Jps


*hduser@sujit:~/Desktop/hadoop/bin$ ls /tmp*
hadoop-hduser-datanode.pid
hsperfdata_hduserkeyring-meecr7
ssh-JXYCAJsX1324
hadoop-hduser-jobtracker.pid
hsperfdata_sujit plugtmp
unity_support_test.0
hadoop-hduser-namenode.pid
Jetty_0_0_0_0_50030_jobyn7qmkpulse-2L9K88eMlGn7
virtual-hduser.Q8j5nJ
hadoop-hduser-secondarynamenode.pid
Jetty_0_0_0_0_50070_hdfsw2cu08   pulse-Ob9vyJcXyHZz
hadoop-hduser-tasktracker.pid
Jetty_0_0_0_0_50090_secondaryy6aanv  pulse-PKdhtXMmr18n

*Deleted *.pid file :)

hduser@sujit:~$ ls /tmp*
hsperfdata_hduserpulse-2L9K88eMlGn7
hsperfdata_sujit pulse-Ob9vyJcXyHZz
Jetty_0_0_0_0_50030_jobyn7qmkpulse-PKdhtXMmr18n
Jetty_0_0_0_0_50070_hdfsw2cu08   ssh-JXYCAJsX1324
Jetty_0_0_0_0_50090_secondaryy6aanv  unity_support_test.0
keyring-meecr7   virtual-hduser.Q8j5nJ
plugtmp





*hduser@sujit:~/Desktop/hadoop$ bin/hadoop namenode -format*
Warning: $HADOOP_HOME is deprecated.

12/04/06 23:23:22 INFO namenode.NameNode: STARTUP_MSG:
/
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = sujit.(null)/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 1.0.1
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
/
Re-format filesystem in /app/hadoop/tmp/dfs/name ? (Y or N) Y
12/04/06 23:23:25 INFO util.GSet: VM type   = 32-bit
12/04/06 23:23:25 INFO util.GSet: 2% max memory = 17.77875 MB
12/04/06 23:23:25 INFO util.GSet: capacity  = 2^22 = 4194304 entries
12/04/06 23:23:25 INFO util.GSet: recommended=4194304, actual=4194304
12/04/06 23:23:25 INFO namenode.FSNamesystem: fsOwner=hduser
12/04/06 23:23:25 INFO namenode.FSNamesystem: supergroup=supergroup
12/04/06 23:23:25 INFO namenode.FSNamesystem: isPermissionEnabled=true
12/04/06 23:23:25 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
12/04/06 23:23:25 INFO namenode.FSNamesystem: isAccessTokenEnabled=false
accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
12/04/06 23:23:25 INFO namenode.NameNode: Caching file names occuring more
than 10 times
12/04/06 23:23:26 INFO common.Storage: Image file of size 112 saved in 0
seconds.
12/04/06 23:23:26 INFO common.Storage: Storage directory
/app/hadoop/tmp/dfs/name has been successfully formatted.
12/04/06 23:23:26 INFO namenode.NameNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down NameNode at sujit.(null)/127.0.1.1
/
hduser@sujit:~/Desktop/hadoop$ bin/start-all.sh
Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-namenode-sujit.out
localhost: starting datanode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-datanode-sujit.out
localhost: starting secondarynamenode, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-sujit.out
starting jobtracker, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-jobtracker-sujit.out
localhost: starting tasktracker, logging to
/home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-tasktracker-sujit.out


*hduser@sujit:~/Desktop/hadoop$ jps*
14157 JobTracker
14492 Jps
14397 TaskTracker
14063 SecondaryNameNode

Re: Data Node is not Started

2012-04-06 Thread Sujit Dhamale
taStorage.recoverTransitionRead(DataStorage.java:147)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)

2012-04-06 23:12:13,767 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at sujit.(null)/127.0.1.1
/
2012-04-06 23:23:46,591 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = sujit.(null)/127.0.1.1
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.1
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
/
2012-04-06 23:23:46,747 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2012-04-06 23:23:46,759 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2012-04-06 23:23:46,760 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2012-04-06 23:23:46,760 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
started
2012-04-06 23:23:46,913 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
registered.
2012-04-06 23:23:46,919 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
exists!
2012-04-06 23:23:48,136 INFO org.apache.hadoop.ipc.Client: Retrying connect
to server: localhost/127.0.0.1:54310. Already tried 0 time(s).
2012-04-06 23:23:53,122 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
Incompatible namespaceIDs in /app/hadoop/tmp/dfs/data: namenode namespaceID
= 1320262146; datanode namespaceID = 1269725409
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
at
org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:385)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:299)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1582)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1521)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1539)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1665)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1682)

2012-04-06 23:23:53,124 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at sujit.(null)/127.0.1.1
/
hduser@sujit:~/Desktop/hadoop/logs$ A




2 at 11:46 PM, Prashant Kommireddi  wrote:

> Can you check the datanode logs? May  its an incompatible namespace issue.
>
> On Apr 6, 2012, at 11:13 AM, Sujit Dhamale 
> wrote:
>
> > Hi all,
> > my DataNode is not started .
> >
> > even after deleting hadoop*.pid file from /tmp , But still Data node is
> not
> > started ,
> >
> >
> > Hadoop Version: hadoop-1.0.1.tar.gz
> > Java version : java version "1.6.0_26
> > Operating System : Ubuntu 11.10
> >
> >
> > i did below procedure
> >
> >
> > *hduser@sujit:~/Desktop/hadoop/bin$ jps*
> > 11455 Jps
> >
> >
> > *hduser@sujit:~/Desktop/hadoop/bin$ start-all.sh*
> > Warning: $HADOOP_HOME is deprecated.
> >
> > starting namenode, logging to
> >
> /home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-namenode-sujit.out
> > localhost: starting datanode, logging to
> >
> /home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-datanode-sujit.out
> > localhost: starting secondarynamenode, logging to
> >
> /home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-secondarynamenode-sujit.out
> > starting jobtracker, logging to
> >
> /home/hduser/Desktop/hadoop/libexec/../logs/hadoop-hduser-jobtrack

Re: Mapping is not happening.

2012-04-09 Thread Sujit Dhamale
Hi Friends ,
i am not able to run word count problem ,
Please look in the below issue.

i am not able to Find what exactly issue is , not able to troubleshoot.

please help me out :)

Thanks in Advance.

Kind Regards
Sujit

On Mon, Apr 9, 2012 at 8:56 AM, Sujit Dhamale wrote:

> hi all ,
> i did all Hadoop installation and Configuration  , but while Executing
> Word count program , Mapping is not happening
> not getting result
> getting Below Result while executing program
>
> sujit@sujit:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.0.1.jar
> wordcount /input /output
> hdfs://localhost:54310/
> input
> 12/04/08 20:58:04 INFO input.FileInputFormat: Total input paths to process
> : 1
> 12/04/08 20:58:04 INFO mapred.JobClient: Running job: job_201204082039_0002
> 12/04/08 20:58:05 INFO mapred.JobClient:  map 0% reduce 0%
> 12/04/08 20:58:05 INFO mapred.JobClient: Job complete:
> job_201204082039_0002
> 12/04/08 20:58:05 INFO mapred.JobClient: Counters: 0
>
>
>
>
> Hadoop Version: hadoop-1.0.1.tar.gz
> java version "1.6.0_30"
> Operating System : Ubuntu 11.10
>
> Below are *steps to reproduce** *
>
> sujit@sujit:~$ cd /usr/local/hadoop
>
> *sujit@sujit:/usr/local/hadoop$ bin/start-all.sh *
> Warning: $HADOOP_HOME is deprecated.
> starting namenode, logging to
> /usr/local/hadoop/libexec/../logs/hadoop-sujit-namenode-sujit.out
> localhost: starting datanode, logging to
> /usr/local/hadoop/libexec/../logs/hadoop-sujit-datanode-sujit.out
> localhost: starting secondarynamenode, logging to
> /usr/local/hadoop/libexec/../logs/hadoop-sujit-secondarynamenode-sujit.out
> starting jobtracker, logging to
> /usr/local/hadoop/libexec/../logs/hadoop-sujit-jobtracker-sujit.out
> localhost: starting tasktracker, logging to
> /usr/local/hadoop/libexec/../logs/hadoop-sujit-tasktracker-sujit.out
>
> *sujit@sujit:/usr/local/hadoop$ jps*
> 7297 TaskTracker
> 6720 DataNode
> 6485 NameNode
> 7411 Jps
> 7061 JobTracker
> 6969 SecondaryNameNode
> 7061 JobTracker
> 6969 SecondaryNameNode
>
> *sujit@sujit:/usr/local/hadoop$ bin/hadoop fs -cat /input/input.txt*
> Warning: $HADOOP_HOME is deprecated.
>
> Hello World
> This is Sujit
> Sujit Dhamale
> Hello
>
>
> *sujit@sujit:/usr/local/hadoop$ bin/hadoop dfs -lsr /*
> Warning: $HADOOP_HOME is deprecated.
>
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:20 /app
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:20 /app/hadoop
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:20 /app/hadoop/tmp
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:40
> /app/hadoop/tmp/mapred
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging
> drwxr-xr-x   - sujit supergroup  0 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit
> drwx--   - sujit supergroup  0 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging
> drwx--   - sujit supergroup  0 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0001
> -rw-r--r--  10 sujit supergroup 142465 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0001/job.jar
> -rw-r--r--  10 sujit supergroup110 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0001/job.split
> -rw-r--r--   1 sujit supergroup 26 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0001/job.splitmetainfo
> -rw-r--r--   1 sujit supergroup  20238 2012-04-08 20:31
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0001/job.xml
> drwx--   - sujit supergroup  0 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0002
> -rw-r--r--  10 sujit supergroup 142465 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0002/job.jar
> drwx--   - sujit supergroup  0 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0003
> -rw-r--r--  10 sujit supergroup 142465 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0003/job.jar
> -rw-r--r--  10 sujit supergroup110 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0003/job.split
> -rw-r--r--   1 sujit supergroup 26 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0003/job.splitmetainfo
> -rw-r--r--   1 sujit supergroup  20242 2012-04-08 20:34
> /app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204082020_0003/job.xml
> drwx--   - sujit supergroup  0 2012-04-08 20:40
> /app/hadoop/tmp/mapred/system
> -rw---   1 

Re: getting UnknownHostException

2012-04-15 Thread Sujit Dhamale
hi Madhu,
After doing Modification in /ets/host it's working fine

Thanks a lot :)

Kind Regards
Sijit Dhamale
(+91 9970086652)

On Fri, Apr 13, 2012 at 10:49 AM, madhu phatak  wrote:

> Please check contents of /etc/hosts for the hostname and ipaddress mapping.
>
> On Thu, Apr 12, 2012 at 11:11 PM, Sujit Dhamale  >wrote:
>
> > Hi Friends ,
> > i am getting UnknownHostException while executing Hadoop Word count
> program
> >
> > getting below details from job tracker Web page
> >
> > *User:* sujit
> > *Job Name:* word count
> > *Job File:*
> >
> >
> hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204112234_0002/job.xml<
> > http://localhost:50030/jobconf.jsp?jobid=job_201204112234_0002>
> > *Submit Host:* sujit.(null)
> > *Submit Host Address:* 127.0.1.1
> > *Job-ACLs: All users are allowed*
> > *Job Setup:*None
> > *Status:* Failed
> > *Failure Info:*Job initialization failed: java.net.UnknownHostException:
> > sujit.(null) is not a valid Inet address at org.apache.hadoop.net.
> > NetUtils.verifyHostnames(NetUtils.java:569) at
> > org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:711)
> at
> > org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207) at
> >
> >
> org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> > at java.lang.Thread.run(Thread.java:662)
> > *Started at:* Wed Apr 11 22:36:46 IST 2012
> > *Failed at:* Wed Apr 11 22:36:47 IST 2012
> > *Failed in:* 0sec
> > *Job Cleanup:*None
> >
> >
> >
> >
> > Can some one help me how to resolve this issue .
> > i tried with : http://wiki.apache.org/hadoop/UnknownHost
> >
> > but still not able to resolve issue ,
> > please help me out .
> >
> >
> > Hadoop Version: hadoop-1.0.1.tar.gz
> > java version "1.6.0_30"
> > Operating System : Ubuntu 11.10
> >
> >
> > *Note *: All node were up before starting execution of Program
> >
> > Kind Regards
> > Sujit Dhamale
> > <http://wiki.apache.org/hadoop/UnknownHost>
> >
>
>
>
> --
> https://github.com/zinnia-phatak-dev/Nectar
>


Re: Hadoop with Sharded MySql

2012-05-31 Thread Sujit Dhamale
Hi ,
instead of pulling 70K tables from mysql into hdfs.
take dump of all 30 table and put in to hBase data base .

if you pulled 70K tables from mysql into hdfs , you need to use Hive , but
modification will not possible in Hive :(

*@ common-user :* please correct me , if i am wrong .

Kind Regards
Sujit Dhamale
(+91 9970086652)
On Fri, Jun 1, 2012 at 5:42 AM, Edward Capriolo wrote:

> Maybe you can do some VIEWs or unions or merge tables on the mysql
> side to overcome the aspect of launching so many sqoop jobs.
>
> On Thu, May 31, 2012 at 6:02 PM, Srinivas Surasani
>  wrote:
> > All,
> >
> > We are trying to implement sqoop in our environment which has 30 mysql
> > sharded databases and all the databases have around 30 databases with
> > 150 tables in each of the database which are all sharded (horizontally
> > sharded that means the data is divided into all the tables in mysql).
> >
> > The problem is that we have a total of around 70K tables which needed
> > to be pulled from mysql into hdfs.
> >
> > So, my question is that generating 70K sqoop commands and running them
> > parallel is feasible or not?
> >
> > Also, doing incremental updates is going to be like invoking 70K
> > another sqoop jobs which intern kick of map-reduce jobs.
> >
> > The main problem is monitoring and managing this huge number of jobs?
> >
> > Can anyone suggest me the best way of doing it or is sqoop a good
> > candidate for this type of scenario?
> >
> > Currently the same process is done by generating tsv files  mysql
> > server and dumped into staging server and  from there we'll generate
> > hdfs put statements..
> >
> > Appreciate your suggestions !!!
> >
> >
> > Thanks,
> > Srinivas Surasani
>


Re: The NCDC Weather Data for Hadoop the Definitive Guide

2012-11-15 Thread Sujit Dhamale
Hi,
If Needed you can run Below Script for Storing Data on your Local System

for i in {1901..2012}
do
cd /home/ubuntu/work/
wget -r -np -nH .cut-dirs=3 -R index.html
http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
cd pub/data/noaa/$i/
cp *.gz /home/ubuntu/work/files
cd /home/ubuntu/work/
rm -r pub/
done



On Mon, Feb 13, 2012 at 3:43 PM, Andy Doddington wrote:

> OK, well for starters, I think you can safely ignore the PDF data; to
> paraphrase Star Wars" “that isn’t the data
> in which you are interested”.
>
> Page 16 of the book describes the data format and refers to a data store
> that contains directories for each year from
> 1901 to 2001. It also shows the naming of .gz files within a sample
> directory (1990). The files in this directory have
> names "010010-9-1990.gz", "010014-9-1990.gz",
> "010015-9-1990.gz", and so on…
>
> Referring back to the NCDC web site, at the link below (
> http://www.ncdc.noaa.gov) and clicking on the ‘Free Data’
> link on the left-hand side of the screen beings up a new screen, as shown
> below:
>
>
> Clicking again on the ‘Free Data’ link in the middle section of this page
> brings up another page, listing the available
> data sets:
>
>
> As this page notes, although some of this data needs to be paid for, there
> is at least one ‘free’ options within
> each section. For simplicity, I went for the first one - the one labelled
> “3505 FTP data access” - which the comment
> says is free. I used anonymous FTP and found that this site contained
> directories for each year from 1901 to 2012.
> I expect the additional directories reflect the fact that time has moved
> on since the book was written :-)
>
> There are also several text or pdf files that provide further information
> on the contents of the site. I suggest you
> read some of these to get more details. One of these is called
> "ish-format-document.pdf" and it seems to describe
> the document format in some detail. If you open this, you can check
> whether it matches the formate expected by
> the hadoop sample code. There is also a ‘software’ directory, which
> contains various bits of code that might
> prove useful.
>
> On drilling down into the directory for 1990, I get the following list of
> files:
>
>
> Which looks close enough to the the file names in the hadoop book - I’d
> guess that these are the correct files.
>
> Given the passage of time, it is still possible that the file format has
> changed to make it incompatible with the
> hadoop code. However, it shouldn’t be that difficult to modify the code to
> suit the new format (which is very
> well documented, as already noted).
>
> Good luck!
>
> Andy
>
> ——
>
> On 12 Feb 2012, at 08:50, Bing Li wrote:
>
> Andy,
>
> Since there is a lot of data on the free data of the site, I cannot figure
> out which one is the one talked in the book. Any format differences might
> cause the source code to get exceptions. Some data is even in PDF format!
>
> Thanks so much!
> Bing
>
> On Sun, Feb 12, 2012 at 4:35 PM, Andy Doddington  >wrote:
>
> According to Page 15 of the book, this data is available from the US
>
> National Climatic Data Center, at
>
> http://www.ncdc.noaa.gov. Once you get to this site, there is a menu of
>
> links on the left-hand side of the
>
> page, listed under the heading ‘Data & Products’. I suspect that the entry
>
> labelled ‘Free Data’ is the most
>
> likely area you need to investigate :-)
>
>
> Good Luck
>
>
> Andy D
>
>
> 
>
>
> On 12 Feb 2012, at 07:14, Bing Li wrote:
>
>
> Dear all,
>
>
> I am following the book, Hadoop: the Definitive Guide. However, I got
>
> stuck
>
> because I could not get the NCDC Weather data that is used by the source
>
> code in the book. The Appendix C told me I could follow some instructions
>
> in www.hadoopbook.com. But I didn't get the instructions there. Could
>
> you
>
> give me a hand?
>
>
> Thanks so much!
>
>
> Best regards,
>
> Bing
>
>
>
>
>


Re: The NCDC Weather Data for Hadoop the Definitive Guide

2012-12-05 Thread Sujit Dhamale
To avoid creation of recursively folder follow below steps


1. Create one Folder in your Local drive
  i created as "*/home/sujit/Desktop/Data/*"

2. Create below script and run

for i in {1901..2012}
do
cd */home/sujit/Desktop/Data/*
wget -r --no-parent --reject "index.html*"
http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
done





On Fri, Nov 16, 2012 at 1:01 PM, Sujit Dhamale wrote:

> Hi,
> If Needed you can run Below Script for Storing Data on your Local System
>
> for i in {1901..2012}
> do
> cd /home/ubuntu/work/
> wget -r -np -nH .cut-dirs=3 -R index.html
> http://ftp3.ncdc.noaa.gov/pub/data/noaa/$i/
> cd pub/data/noaa/$i/
> cp *.gz /home/ubuntu/work/files
> cd /home/ubuntu/work/
> rm -r pub/
> done
>
>
>
> On Mon, Feb 13, 2012 at 3:43 PM, Andy Doddington wrote:
>
>> OK, well for starters, I think you can safely ignore the PDF data; to
>> paraphrase Star Wars" “that isn’t the data
>> in which you are interested”.
>>
>> Page 16 of the book describes the data format and refers to a data store
>> that contains directories for each year from
>> 1901 to 2001. It also shows the naming of .gz files within a sample
>> directory (1990). The files in this directory have
>> names "010010-9-1990.gz", "010014-9-1990.gz",
>> "010015-9-1990.gz", and so on…
>>
>> Referring back to the NCDC web site, at the link below (
>> http://www.ncdc.noaa.gov) and clicking on the ‘Free Data’
>> link on the left-hand side of the screen beings up a new screen, as shown
>> below:
>>
>>
>> Clicking again on the ‘Free Data’ link in the middle section of this page
>> brings up another page, listing the available
>> data sets:
>>
>>
>> As this page notes, although some of this data needs to be paid for,
>> there is at least one ‘free’ options within
>> each section. For simplicity, I went for the first one - the one labelled
>> “3505 FTP data access” - which the comment
>> says is free. I used anonymous FTP and found that this site contained
>> directories for each year from 1901 to 2012.
>> I expect the additional directories reflect the fact that time has moved
>> on since the book was written :-)
>>
>> There are also several text or pdf files that provide further information
>> on the contents of the site. I suggest you
>> read some of these to get more details. One of these is called
>> "ish-format-document.pdf" and it seems to describe
>> the document format in some detail. If you open this, you can check
>> whether it matches the formate expected by
>> the hadoop sample code. There is also a ‘software’ directory, which
>> contains various bits of code that might
>> prove useful.
>>
>> On drilling down into the directory for 1990, I get the following list of
>> files:
>>
>>
>> Which looks close enough to the the file names in the hadoop book - I’d
>> guess that these are the correct files.
>>
>> Given the passage of time, it is still possible that the file format has
>> changed to make it incompatible with the
>> hadoop code. However, it shouldn’t be that difficult to modify the code
>> to suit the new format (which is very
>> well documented, as already noted).
>>
>> Good luck!
>>
>>  Andy
>>
>> ——
>>
>> On 12 Feb 2012, at 08:50, Bing Li wrote:
>>
>> Andy,
>>
>> Since there is a lot of data on the free data of the site, I cannot figure
>> out which one is the one talked in the book. Any format differences might
>> cause the source code to get exceptions. Some data is even in PDF format!
>>
>> Thanks so much!
>> Bing
>>
>> On Sun, Feb 12, 2012 at 4:35 PM, Andy Doddington > >wrote:
>>
>> According to Page 15 of the book, this data is available from the US
>>
>> National Climatic Data Center, at
>>
>> http://www.ncdc.noaa.gov. Once you get to this site, there is a menu of
>>
>> links on the left-hand side of the
>>
>> page, listed under the heading ‘Data & Products’. I suspect that the entry
>>
>> labelled ‘Free Data’ is the most
>>
>> likely area you need to investigate :-)
>>
>>
>> Good Luck
>>
>>
>> Andy D
>>
>>
>> 
>>
>>
>> On 12 Feb 2012, at 07:14, Bing Li wrote:
>>
>>
>> Dear all,
>>
>>
>> I am following the book, Hadoop: the Definitive Guide. However, I got
>>
>> stuck
>>
>> because I could not get the NCDC Weather data that is used by the source
>>
>> code in the book. The Appendix C told me I could follow some instructions
>>
>> in www.hadoopbook.com. But I didn't get the instructions there. Could
>>
>> you
>>
>> give me a hand?
>>
>>
>> Thanks so much!
>>
>>
>> Best regards,
>>
>> Bing
>>
>>
>>
>>
>>
>