Re: Realtime sensor's tcpip data to hadoop

2014-05-14 Thread Hardik Pandya
If I were you I would ask following questions to get the answer

> forget about for a minute and ask yourself how tcpip data are currently
being stored - in fs/rdbmbs?
> hadoop is for offiline batch processing - if you are looking for real
time streaming solution - there is a storm (from linkedin) that can go well
with kafka (messaging queue) or spark streaming (which is in memory
map-reduce) and takes real time streams - has in built twitter api but you
need to write your own service to poll data every few seconds and send it
in RDD format
> storm is complementary to hadoop - spark in conjuction with hadoop will
allow you to do both offline and real time data analytics




On Tue, May 6, 2014 at 10:48 PM, Alex Lee  wrote:

> Sensors' may send tcpip data to server. Each sensor may send tcpip data
> like a stream to the server, the quatity of the sensors and the data rate
> of the data is high.
>
> Firstly, how the data from tcpip can be put into hadoop. It need to do
> some process and store in hbase. Does it need through save to data files
> and put into hadoop or can be done in some direct ways from tcpip. Is there
> any software module can take care of this. Searched that Ganglia Nagios and
> Flume may do it. But when looking into details, ganglia and nagios are
> more for monitoring hadoop cluster itself. Flume is for log files.
>
> Secondly, if the total network traffic from sensors are over the limit of
> one lan port, how to share the loads, is there any component in hadoop to
> make this done automatically.
>
> Any suggestions, thanks.
>


Re: Wordcount file cannot be located

2014-05-02 Thread Hardik Pandya
Please add below to your config - for some reason hadoop-common jar is
being overwritten - please share your feedback - thanks
config.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName()



On Fri, May 2, 2014 at 12:08 AM, Alex Lee  wrote:

> I tried to add the code, but seems still not working.
> http://postimg.org/image/6c1dat3jx/
>
> 2014-05-02 11:56:06,780 WARN  [main] util.NativeCodeLoader
> (NativeCodeLoader.java:(62)) - Unable to load native-hadoop library
> for your platform... using builtin-java classes where applicable
> java.io.IOException: No FileSystem for scheme: hdfs
>
> Also, the eclipse DFS location can reach the /tmp/ but cannot enter the
> /user/
>
> Any suggestion, thanks.
>
> alex
>
> --
> From: unmeshab...@gmail.com
> Date: Fri, 2 May 2014 08:43:26 +0530
> Subject: Re: Wordcount file cannot be located
> To: user@hadoop.apache.org
>
>
> Try this along with your MapReduce source code
>
> Configuration config = new Configuration();
> config.set("fs.defaultFS", "hdfs://IP:port/");
> FileSystem dfs = FileSystem.get(config);
> Path path = new Path("/tmp/in");
>
> Let me know your thoughts.
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


Re: Wordcount file cannot be located

2014-05-01 Thread Hardik Pandya
Hi Alex,

Your hadoop program configuration is looking into local filesystem directory

By default core-site.xml points to local file system

fs.default.namefile:///

instead of : file:/tmp/in , if file resides on hdfs, please point
fs.default.name to hdfs

Configuration conf = getConf();
conf.set(“fs.default.name”, “hdfs:///localhost.localdomain:8020/”);

also this can be configured in core-site.xml

please update with appropriate value

core-site.xml for e.g

fs.default.name
hdfs://localhost:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri’s scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri’s authority is used to
determine the host, port, etc. for a filesystem.

Thanks,
Hardik


On Thu, May 1, 2014 at 9:12 AM, Alex Lee  wrote:

> I am using eclipse(kepler) to run the wordcount example on hadoop 2.2 with
> plugin 2.2.
>
> I am trying Run as Configuration, and the Arguments is  /tmp/in /tmp/out
>
> The console always said:
> 2014-05-01 21:05:46,280 ERROR [main] security.UserGroupInformation
> (UserGroupInformation.java:doAs(1494)) - PriviledgedActionException as:root
> (auth:SIMPLE)
> cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
> path does not exist: file:/tmp/in
> org.apache.hadoop.mapreduce.lib.input.InvalidInputException:* Input path
> does not exist*: file:/tmp/in
>
> Pls see the screenshot
> http://postimg.org/image/jtpo6nox3/
>
> Both hadoop command and DFS Locations can find the file.
>
> Any suggestions, thanks.
>
> Alex
>
>


Re: Using Eclipse for Hadoop code

2014-05-01 Thread Hardik Pandya
I blogged about running map reduce application in eclipse sometime back

http://letsdobigdata.wordpress.com/2013/12/07/running-hadoop-mapreduce-application-from-eclipse-kepler/


On Wed, Apr 30, 2014 at 6:53 AM, unmesha sreeveni wrote:

> Are you asking about standalone mode where we run hadoop using local fs?​​
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Center for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>


Re: HttpConfig API changes in hadoop 2.4.0

2014-05-01 Thread Hardik Pandya
https://issues.apache.org/jira/browse/HDFS-5308


On Thu, May 1, 2014 at 8:58 AM, Hardik Pandya wrote:

> You are hitting this  
> <http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/CHANGES.txt>HDFS-5308.
>  Replace HttpConfig#getSchemePrefix with implicit schemes in HDFS JSP. 
> (Haohui Mai via jing9)
>
>
>
> On Wed, Apr 30, 2014 at 6:08 PM, Gaurav Gupta wrote:
>
>> I am trying to get the container logs url and here is the code snippet
>>
>> containerLogsUrl = HttpConfig.getSchemePrefix() + 
>> this.container.nodeHttpAddress + "/node/containerlogs/" + id + "/" + 
>> System.getenv(ApplicationConstants.Environment.USER.toString());
>>
>>
>> Thanks
>>
>> Gaurav
>>
>>
>>
>> On Wed, Apr 30, 2014 at 3:02 PM, Haohui Mai  wrote:
>>
>>> Hi,
>>>
>>> Can you describe your use cases, that is, how the prefix is used?
>>> Usually you can get around with it by generating relative URLs, which
>>> starts at "//".
>>>
>>> ~Haohui
>>>
>>>
>>> On Wed, Apr 30, 2014 at 2:31 PM, Gaurav Gupta 
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I was using hadoop 2.2.0 to build my application. I was using
>>>> HttpConfig.getSchemaPrefix() api call. When I updated hadoop to 2.4.0, the
>>>> compilation fails for my application and I see that HttpConfig
>>>> (org.apache.hadoop.http.HttpConfig) APIs have changed.
>>>>
>>>> How do I get the schrema Prefix in 2.4?
>>>>
>>>> Thanks
>>>> Gaurav
>>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or entity
>>> to which it is addressed and may contain information that is confidential,
>>> privileged and exempt from disclosure under applicable law. If the reader
>>> of this message is not the intended recipient, you are hereby notified that
>>> any printing, copying, dissemination, distribution, disclosure or
>>> forwarding of this communication is strictly prohibited. If you have
>>> received this communication in error, please contact the sender immediately
>>> and delete it from your system. Thank You.
>>
>>
>>
>


Re: HttpConfig API changes in hadoop 2.4.0

2014-05-01 Thread Hardik Pandya
You are hitting this
HDFS-5308.
Replace HttpConfig#getSchemePrefix with implicit schemes in HDFS JSP.
(Haohui Mai via jing9)



On Wed, Apr 30, 2014 at 6:08 PM, Gaurav Gupta wrote:

> I am trying to get the container logs url and here is the code snippet
>
> containerLogsUrl = HttpConfig.getSchemePrefix() + 
> this.container.nodeHttpAddress + "/node/containerlogs/" + id + "/" + 
> System.getenv(ApplicationConstants.Environment.USER.toString());
>
>
> Thanks
>
> Gaurav
>
>
>
> On Wed, Apr 30, 2014 at 3:02 PM, Haohui Mai  wrote:
>
>> Hi,
>>
>> Can you describe your use cases, that is, how the prefix is used? Usually
>> you can get around with it by generating relative URLs, which starts at
>> "//".
>>
>> ~Haohui
>>
>>
>> On Wed, Apr 30, 2014 at 2:31 PM, Gaurav Gupta 
>> wrote:
>>
>>> Hi,
>>>
>>> I was using hadoop 2.2.0 to build my application. I was using
>>> HttpConfig.getSchemaPrefix() api call. When I updated hadoop to 2.4.0, the
>>> compilation fails for my application and I see that HttpConfig
>>> (org.apache.hadoop.http.HttpConfig) APIs have changed.
>>>
>>> How do I get the schrema Prefix in 2.4?
>>>
>>> Thanks
>>> Gaurav
>>>
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity
>> to which it is addressed and may contain information that is confidential,
>> privileged and exempt from disclosure under applicable law. If the reader
>> of this message is not the intended recipient, you are hereby notified that
>> any printing, copying, dissemination, distribution, disclosure or
>> forwarding of this communication is strictly prohibited. If you have
>> received this communication in error, please contact the sender immediately
>> and delete it from your system. Thank You.
>
>
>


Re: when it's safe to read map-reduce result?

2014-03-28 Thread Hardik Pandya
if the job complets without any failures exitCode should be 0 and safe
to read the result
public class MyApp extends Configured implements Tool {

   public int run(String[] args) throws Exception {
 // Configuration processed by ToolRunner
 Configuration conf = getConf();

 // Create a JobConf using the processed conf
 JobConf job = new JobConf(conf, MyApp.class);

 // Process custom command-line options
 Path in = new Path(args[1]);
 Path out = new Path(args[2]);

 // Specify various job-specific parameters
 job.setJobName("my-app");
 job.setInputPath(in);
 job.setOutputPath(out);
 job.setMapperClass(MyMapper.class);
 job.setReducerClass(MyReducer.class);

 // Submit the job, then poll for progress until the job is complete
 JobClient.runJob(job);
 return 0;
   }

   public static void main(String[] args) throws Exception {
 // Let ToolRunner handle generic command-line options
 int res = ToolRunner.run(new Configuration(), new MyApp(), args);

 System.exit(res);
   }
 }



On Fri, Mar 28, 2014 at 4:41 AM, Li Li  wrote:

> thanks. is the following codes safe?
> int exitCode=ToolRunner.run()
> if(exitCode==0){
>//safe to read result
> }
>
> On Fri, Mar 28, 2014 at 4:36 PM, Dieter De Witte 
> wrote:
> > _SUCCES implies that the job has succesfully terminated, so this seems
> like
> > a reasonable criterion.
> >
> > Regards, Dieter
> >
> >
> > 2014-03-28 9:33 GMT+01:00 Li Li :
> >
> >> I have a program that do some map-reduce job and then read the result
> >> of the job.
> >> I learned that hdfs is not strong consistent. when it's safe to read the
> >> result?
> >> as long as output/_SUCCESS exist?
> >
> >
>


Re: Hadoop documentation: control flow and FSM diagrams

2014-03-28 Thread Hardik Pandya
Very helpful indeed Emillio, thanks!


On Fri, Mar 28, 2014 at 12:58 PM, Emilio Coppa  wrote:

> Hi All,
>
> I have created a wiki on github:
>
> https://github.com/ercoppa/HadoopDiagrams/wiki
>
> This is an effort to provide an updated documentation of how the internals
> of Hadoop work.  The main idea is to help the user understand the "big
> picture" without removing too much internal details. You can find several
> diagrams (e.g. Finite State Machine and control flow). They are based on
> Hadoop 2.3.0.
>
> Notice that:
>
> - they are not specified in any formal language (e.g., UML) but they
> should easy to understand (Do you agree?)
> - they cover only some aspects of Hadoop but I am improving them day after
> day
> - they are not always correct but I am trying to fix errors,
> remove ambiguities, etc
>
> I hope this can be helpful to somebody out there. Any feedback from you
> may be valuable for me.
>
> Emilio.
>


Re: reducing HDFS FS connection timeouts

2014-03-28 Thread Hardik Pandya
how about adding

ipc.client.connect.max.retries.on.timeouts
*2 (default is 45)*Indicates the number of retries a client will make on
socket timeout to establish a server connection.
does that help?


On Thu, Mar 27, 2014 at 4:23 PM, John Lilley wrote:

>  It seems to take a very long time to timeout a connection to an invalid
> NN URI.  Our application is interactive so the defaults of taking many
> minutes don't work well.  I've tried setting:
>
> conf.set("ipc.client.connect.max.retries", "2");
>
> conf.set("ipc.client.connect.timeout", "7000");
>
> before calling FileSystem.get() but it doesn't seem to matter.
>
> What is the prescribed technique for lowering connection timeout to HDFS?
>
> Thanks
>
> john
>
>
>


Re: How to get locations of blocks programmatically?

2014-03-28 Thread Hardik Pandya
have you looked into FileSystem API this is hadoop v2.2.0

http://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/FileSystem.html

does not exist in
http://hadoop.apache.org/docs/r1.2.0/api/org/apache/hadoop/fs/FileSystem.html

 
org.apache.hadoop.fs.RemoteIteratorhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/LocatedFileStatus.html>
> *listFiles
*
(Path
f,
boolean recursive)
  List the statuses and block locations of the files in the given
path.   
org.apache.hadoop.fs.RemoteIteratorhttp://hadoop.apache.org/docs/r2.2.0/api/org/apache/hadoop/fs/LocatedFileStatus.html>
> *listLocatedStatus
*
(Path
 f)
  List the statuses of the files/directories in the given path if
the path is a directory.


On Thu, Mar 27, 2014 at 10:03 PM, Libo Yu  wrote:

> Hi all,
>
> "hadoop path fsck -files -block -locations" can list locations for all
> blocks in the path.
> Is it possible to list all blocks and the block locations for a given path
> programmatically?
> Thanks,
>
> Libo
>


Re: Why is HDFS_BYTES_WRITTEN is much larger than HDFS_BYTES_READ in this case?

2014-03-28 Thread Hardik Pandya
what is your compression format gzip, lzo or snappy

for lzo final output

FileOutputFormat.setCompressOutput(conf, true);
FileOutputFormat.setOutputCompressorClass(conf, LzoCodec.class);

In addition, to make LZO splittable, you need to make a LZO index file.


On Thu, Mar 27, 2014 at 8:57 PM, Kim Chew  wrote:

> Thanks folks.
>
> I am not awared my input data file has been compressed.
> FileOutputFromat.setCompressOutput() is set to true when the file is
> written. 8-(
>
> Kim
>
>
> On Thu, Mar 27, 2014 at 5:46 PM, Mostafa Ead wrote:
>
>> The following might answer you partially:
>>
>> Input key is not read from HDFS, it is auto generated as the offset of
>> the input value in the input file. I think that is (partially) why read
>> hdfs bytes is smaller than written hdfs bytes.
>>  On Mar 27, 2014 1:34 PM, "Kim Chew"  wrote:
>>
>>> I am also wondering if, say, I have two identical timestamp so they are
>>> going to be written to the same file. Does MulitpleOutputs handle appending?
>>>
>>> Thanks.
>>>
>>> Kim
>>>
>>>
>>> On Thu, Mar 27, 2014 at 12:30 PM, Thomas Bentsen  wrote:
>>>
 Have you checked the content of the files you write?


 /th

 On Thu, 2014-03-27 at 11:43 -0700, Kim Chew wrote:
 > I have a simple M/R job using Mapper only thus no reducer. The mapper
 > read a timestamp from the value, generate a path to the output file
 > and writes the key and value to the output file.
 >
 >
 > The input file is a sequence file, not compressed and stored in the
 > HDFS, it has a size of 162.68 MB.
 >
 >
 > Output also is written as a sequence file.
 >
 >
 >
 > However, after I ran my job, I have two output part files from the
 > mapper. One has a size of 835.12 MB and the other has a size of 224.77
 > MB. So why is the total outputs size is so much larger? Shouldn't it
 > be more or less equal to the input's size of 162.68MB since I just
 > write the key and value passed to mapper to the output?
 >
 >
 > Here is the mapper code snippet,
 >
 > public void map(BytesWritable key, BytesWritable value, Context
 > context) throws IOException, InterruptedException {
 >
 > long timestamp = bytesToInt(value.getBytes(),
 > TIMESTAMP_INDEX);;
 > String tsStr = sdf.format(new Date(timestamp * 1000L));
 >
 > mos.write(key, value, generateFileName(tsStr)); // mos is a
 > MultipleOutputs object.
 > }
 >
 > private String generateFileName(String key) {
 > return outputDir+"/"+key+"/raw-vectors";
 > }
 >
 >
 > And here are the job outputs,
 >
 > 14/03/27 11:00:56 INFO mapred.JobClient: Launched map tasks=2
 > 14/03/27 11:00:56 INFO mapred.JobClient: Data-local map tasks=2
 > 14/03/27 11:00:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
 > 14/03/27 11:00:56 INFO mapred.JobClient:   File Output Format
 > Counters
 > 14/03/27 11:00:56 INFO mapred.JobClient: Bytes Written=0
 > 14/03/27 11:00:56 INFO mapred.JobClient:   FileSystemCounters
 > 14/03/27 11:00:56 INFO mapred.JobClient: HDFS_BYTES_READ=171086386
 > 14/03/27 11:00:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=54272
 > 14/03/27 11:00:56 INFO mapred.JobClient:
 > HDFS_BYTES_WRITTEN=374798
 > 14/03/27 11:00:56 INFO mapred.JobClient:   File Input Format Counters
 > 14/03/27 11:00:56 INFO mapred.JobClient: Bytes Read=170782415
 > 14/03/27 11:00:56 INFO mapred.JobClient:   Map-Reduce Framework
 > 14/03/27 11:00:56 INFO mapred.JobClient: Map input records=547
 > 14/03/27 11:00:56 INFO mapred.JobClient: Physical memory (bytes)
 > snapshot=166428672
 > 14/03/27 11:00:56 INFO mapred.JobClient: Spilled Records=0
 > 14/03/27 11:00:56 INFO mapred.JobClient: Total committed heap
 > usage (bytes)=38351872
 > 14/03/27 11:00:56 INFO mapred.JobClient: CPU time spent (ms)=20080
 > 14/03/27 11:00:56 INFO mapred.JobClient: Virtual memory (bytes)
 > snapshot=1240104960
 > 14/03/27 11:00:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=286
 > 14/03/27 11:00:56 INFO mapred.JobClient: Map output records=0
 >
 >
 > TIA,
 >
 >
 > Kim
 >



>>>
>


Re: /home/r9r/hadoop-2.2.0/bin/hadoop: line 133: /usr/java/default/bin/java: No such file or directory

2014-01-08 Thread Hardik Pandya
your java home is not set correctly - its still looking under
usr/java/default/bin/java

in your hadoop-env.sh
JAVA_HOME should be /usr/lib/jvm/java-1.7.0/jre/

does your $PATH includes correct ${JAVA_HOME}?





On Wed, Jan 8, 2014 at 3:12 PM, Allen, Ronald L.  wrote:

> Hello again,
>
> I'm trying to install Hadoop 2.2.0 on Redhat 2.6.32-358.23.2.el6.x86_64.
>  I have untar-ed hadoop-2.2.0.tar.gz and set my path variables as below.
>
> export HADOOP_INSTALL=/home/r9r
> export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
> export HADOOP_COMMON_HOME=$HADOOP_INSTALL
> export HADOOP_HDFS_HOME=$HADOOP_INSTALL
> export YARN_HOME=$HADOOP_INSTALL
> export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
> export HADOOP_OPTS="-Djava.library.path"=$HADOOP_INSTALL/lib
>
> export JAVA_HOME=/usr/lib/jvm/java-1.7.0/jre/
>
> When I type hadoop version, I get the following:
>
> /home/r9r/hadoop-2.2.0/bin/hadoop: line 133: /usr/java/default/bin/java:
> No such file or directory
> /home/r9r/hadoop-2.2.0/bin/hadoop: line 133: exec:
> /usr/java/default/bin/java: cannot execute: No such file or directory
>
> I've checked and rechecked my JAVA_HOME.  I feel like it is correct.  I've
> checked hadoop-env.sh and it is set to export JAVA_HOME=${JAVA_HOME}
>
> I am stuck and do not know what to try from here.  Does anyone have any
> ideas?
>
> Thanks!
> Ronnie


Re: JAVA cannot execute binary file

2014-01-07 Thread Hardik Pandya
are you exporting java home in hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

Also, make sure your java installed correctly

To Install Java

Add following PPA and install the latest Oracle Java (JDK) 7 in Ubuntu

sudo add-apt-repository ppa:webupd8team/java

 sudo apt-get update && sudo apt-get install oracle-jdk7-installer

Check if your OS uses Java JDK 7
   1
 java -version

If installed correctly you will get following result

java version "1.7.0_17"
 Java(TM) SE Runtime Environment (build 1.7.0_17-b02)
 Java HotSpot(TM) 64-Bit Server VM (build 23.7-b01, mixed mode)


On Tue, Jan 7, 2014 at 3:12 PM, navaz  wrote:

> Hi
>
> I am following the hadoop document.
>
> https://www.dropbox.com/s/05aurcp42asuktp/Chiu%20Hadoop%20Pig%20Install%20Instructions.docx
> Installed JDK7.
>
> VM:
> 
>
> hduser@base:~$ lsb_release -a
> No LSB modules are available.
> Distributor ID: Ubuntu
> Description:Ubuntu 13.04
> Release:13.04
> Codename:   raring
> hduser@base:~$
>
> But I am getting the below error message.
>
> hduser@base:~$ java -version
> -su: /usr/bin/java: cannot execute binary file
> hduser@base:~$
>
>
> hduser@base:~$ /usr/local/hadoop/bin/hadoop namenode -format
> /usr/local/hadoop/bin/hadoop: line 320: /usr/lib/jvm/jdk1.7.0//bin/java:
> cannot execute binary file
> /usr/local/hadoop/bin/hadoop: line 390: /usr/lib/jvm/jdk1.7.0//bin/java:
> cannot execute binary file
> /usr/local/hadoop/bin/hadoop: line 390: /usr/lib/jvm/jdk1.7.0//bin/java:
> Success
> hduser@base:~$
> hduser@base:~$
>
>
> Could you please help me in this.
>
> Thanks & Regrads
> *Abdul Navaz*
>
>
>


Re: Error: Could not find or load main class hdfs

2014-01-07 Thread Hardik Pandya
hadoop fs -copyFromLocal 

for e.g /user/home/hadoop/input/file1.txt


On Tue, Jan 7, 2014 at 11:41 AM, Allen, Ronald L.  wrote:

> Thank you for responding.
>
> I entered hadoop fs -ls and it returned this:
>
> 14/01/07 11:40:50 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> Found 2 items
> drwxr-xr-x   - r9rhadoop supergroup  0 2014-01-06 13:49 input
> drwxr-xr-x   - r9rhadoop supergroup  0 2014-01-06 15:30 out
>
> I guess it does.
> ________
> From: Hardik Pandya [smarty.ju...@gmail.com]
> Sent: Tuesday, January 07, 2014 11:37 AM
> To: user@hadoop.apache.org
> Subject: Re: Error: Could not find or load main class hdfs
>
> does input directory exist in hdfs? you can check by hadoop fs -ls
>
>
> On Tue, Jan 7, 2014 at 11:16 AM, Allen, Ronald L.  <mailto:allen...@ornl.gov>> wrote:
> Hello,
>
> I am trying to run the WordCount example using Hadoop 2.2.0 on a single
> node.  I tried to follow the directions from
> http://nextgenhadoop.blogspot.in/2013/10/steps-to-install-hadoop-220-stable.html.
>  However, when I type in the command bin/hadoop hdfs -copyFromLocal input
> /input, I get Error: Could not find or load main class hdfs.  I am very new
> to this and have no idea what is going on.  Any help would be greatly
> appreciated!
>
> Thanks,
> Ronnie
>
>


Re: Error: Could not find or load main class hdfs

2014-01-07 Thread Hardik Pandya
does input directory exist in hdfs? you can check by hadoop fs -ls


On Tue, Jan 7, 2014 at 11:16 AM, Allen, Ronald L.  wrote:

> Hello,
>
> I am trying to run the WordCount example using Hadoop 2.2.0 on a single
> node.  I tried to follow the directions from
> http://nextgenhadoop.blogspot.in/2013/10/steps-to-install-hadoop-220-stable.html.
>  However, when I type in the command bin/hadoop hdfs -copyFromLocal input
> /input, I get Error: Could not find or load main class hdfs.  I am very new
> to this and have no idea what is going on.  Any help would be greatly
> appreciated!
>
> Thanks,
> Ronnie


Re: Content of FSImage

2014-01-07 Thread Hardik Pandya
Yes - The entire file system namespace, including the mapping of blocks to
files and file system properties, is stored in a file called the FsImage.
The FsImage is stored as a file in the NameNode’s local file system too.

When the NameNode starts up, it reads the FsImage and EditLog from disk,
applies all the transactions from the EditLog to the in-memory
representation of the FsImage, and flushes out this new version into a new
FsImage on disk. It can then truncate the old EditLog because its
transactions have been applied to the persistent FsImage. This process is
called a checkpoint. In the current implementation, a checkpoint only
occurs when the NameNode starts up. Work is in progress to support periodic
checkpointing in the near future.
During Metadata Disk Failure

The FsImage and the EditLog are central data structures of HDFS. A
corruption of these files can cause the HDFS instance to be non-functional.
For this reason, the NameNode can be configured to support maintaining
multiple copies of the FsImage and EditLog. Any update to either the
FsImage or EditLog causes each of the FsImages and EditLogs to get updated
synchronously. This synchronous updating of multiple copies of the
FsImageand EditLog may degrade the rate of namespace transactions per
second that
a NameNode can support. However, this degradation is acceptable because
even though HDFS applications are very data intensive in nature, they are
not metadata intensive. When a NameNode restarts, it selects the latest
consistent FsImage and EditLog to use.

Reference : HDFS Design
Documentation


On Tue, Jan 7, 2014 at 7:26 AM, Vishnu Viswanath <
vishnu.viswanat...@gmail.com> wrote:

> Hi All,
>
> I read that block information is stored in memory by hadoop once it
> receives block report from the datanodes.
>
> EditLog logs the changes.
>
> What is exactly stored in the FSImage file?
> Does it store information on the files in HDFS and how many blocks are
> there etc?
>
> Thanks
>
>


Re: Spill Failed Caused by ArrayIndexOutOfBoundsException

2014-01-06 Thread Hardik Pandya
The error is happening during Sort And Spill phase

org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill

It seems like you are trying to compare two Int values and it fails during
compare

Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720
at
org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158)
at
org.apache.hadoop.io.BooleanWritable$Comparator.
compare(BooleanWritable.java:103)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116)
at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:
1404)


On Mon, Jan 6, 2014 at 3:21 PM, Paul Mahon  wrote:

> I have a hadoop program that I'm running with version 1.2.1 which
> fails in a peculiar place. Most mappers complete without error, but
> some fail with this stack trace:
>
> java.io.IOException: Spill failed
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1297)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:698)
> at org.apache.hadoop.mapred.MapTask.closeQuietly(MapTask.java:1793)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:779)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 99614720
> at
>
> org.apache.hadoop.io.WritableComparator.readInt(WritableComparator.java:158)
> at
>
> org.apache.hadoop.io.BooleanWritable$Comparator.compare(BooleanWritable.java:103)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.compare(MapTask.java:1116)
> at org.apache.hadoop.util.QuickSort.sortInternal(QuickSort.java:95)
> at org.apache.hadoop.util.QuickSort.sort(QuickSort.java:59)
> at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1404)
> at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:858)
> at
>
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1349)
>
> I've noticed that that array index is exactly the size of the bufvoid,
> but I'm not sure if that has any significance.
>
> The exception isn't happening in my WritableComparable or any of my
> code, it's all in hadoop. I'm not sure what to do to track down what
> I'm doing to cause the problem. Has anyone seen a problem like this or
> have any suggestions of where to look for the problem in my code?
>


Re: Fine tunning

2014-01-06 Thread Hardik Pandya
Can you please share how you are doing the lookup?




On Mon, Jan 6, 2014 at 4:23 AM, Ranjini Rathinam wrote:

> Hi,
>
> I have a input File of 16 fields in it.
>
> Using Mapreduce code need to load the hbase tables.
>
> The first eight has to go into one table in hbase and last eight has to
> got to another hbase table.
>
> The data is being loaded into hbase table in 0.11 sec , but if any lookup
> is being added in the mapreduce code,
> For eg, the input file has one  attribute named currency , it will have a
> master table currency. need to match both values to print it.
>
> The table which has lookup takes long time to get load. For 13250 records
> it take 59 mins.
>
> How to make fine tune to reduce the time for its loading.
>
> Please help.
>
> Thanks in advance.
>
> Ranjini.R
>
>
>


Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread Hardik Pandya
Please do not tell me since last 2.5 years you have not used virtual Hadoop
environment to debug your Map Reduce application before deploying to
Production environment

No one can stop you looking at the code , Hadoop and its ecosystem is
open-source


On Mon, Jan 6, 2014 at 9:35 AM, nagarjuna kanamarlapudi <
nagarjuna.kanamarlap...@gmail.com> wrote:

>
>
> -- Forwarded message --
> From: nagarjuna kanamarlapudi 
> Date: Mon, Jan 6, 2014 at 6:39 PM
> Subject: Understanding MapReduce source code : Flush operations
> To: mapreduce-u...@hadoop.apache.org
>
>
> Hi,
>
> I am using hadoop/ map reduce for aout 2.5 years. I want to understand the
> internals of the hadoop source code.
>
> Let me put my requirement very clear.
>
> I want to have a look at the code where of flush operations that happens
> after the reduce phase.
>
> Reducer writes the output to OutputFormat which inturn pushes that to
> memory and once it reaches 90% of chunk size it starts to flush the reducer
> output.
>
> I essentially want to look at the code of that flushing operation.
>
>
>
>
> Regards,
> Nagarjuna K
>
>


Re: LocalResource size/time limits

2014-01-04 Thread Hardik Pandya
May be this would clarify some aspect of your questions


Resource Localization in YARN Deep
Dive

The threshold for local files is dictated by the configuration property
*yarn.nodemanager.localizer.cache.target-size-mb* described below.
http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

yarn.nodemanager.localizer.cache.target-size-mb10240Target size of
localizer cache in MB, per local directory.


On Sat, Jan 4, 2014 at 12:02 PM, John Lilley wrote:

>  Are there any limits on the total size of LocalResources that a YARN app
> requests?  Do the PUBLIC ones age out of cache over time?  Are there
> settable controls?
>
> Thanks
>
> John
>
>
>
>
>


Re: How to remove slave nodes?

2014-01-04 Thread Hardik Pandya
also you can exclude the data nodes from conf/mapred-site.xml

dfs.hosts/dfs.hosts.excludeList of permitted/excluded DataNodes.If
necessary, use these files to control the list of allowable datanodes.


On Sat, Jan 4, 2014 at 12:37 PM, Hardik Pandya wrote:

> You can  start/stop an Hadoop daemon manually on a machine via 
> bin/hadoop-daemon.sh
> start/stop [namenode | secondarynamenode | datanode | jobtracker |
> tasktracker]
>
>
> On Fri, Jan 3, 2014 at 11:47 AM, navaz  wrote:
>
>> How to remove one of the slave node. ?
>>
>> I have a namenode ( master) and 3 datanode (slave) running. I would like
>> to remove one of the problematic datanode. How can i do this?
>>
>> Unfortunately i dont have access to that problematic data node.
>>
>> Thanks
>> Navaz
>>
>>
>


Re: How to remove slave nodes?

2014-01-04 Thread Hardik Pandya
You can  start/stop an Hadoop daemon manually on a machine via
bin/hadoop-daemon.sh
start/stop [namenode | secondarynamenode | datanode | jobtracker |
tasktracker]


On Fri, Jan 3, 2014 at 11:47 AM, navaz  wrote:

> How to remove one of the slave node. ?
>
> I have a namenode ( master) and 3 datanode (slave) running. I would like
> to remove one of the problematic datanode. How can i do this?
>
> Unfortunately i dont have access to that problematic data node.
>
> Thanks
> Navaz
>
>


Re: Map succeeds but reduce hangs

2014-01-01 Thread Hardik Pandya
do you have your hosnames properly configured in etc/hosts? have you tried
192.168.?.? instead of localhost 127.0.0.1



On Wed, Jan 1, 2014 at 11:33 AM, navaz  wrote:

> Thanks. But I wonder Why map succeeds 100% , How it resolve hostname ?
>
> Now reduce becomes 100% but bailing out slave2 and slave 3 . ( But Mappig
> is succeded for these nodes).
>
> Does it looks for hostname only for reduce ?
>
>
> 14/01/01 09:09:38 INFO mapred.JobClient: Running job: job_201401010908_0001
> 14/01/01 09:09:39 INFO mapred.JobClient:  map 0% reduce 0%
> 14/01/01 09:10:00 INFO mapred.JobClient:  map 33% reduce 0%
> 14/01/01 09:10:01 INFO mapred.JobClient:  map 66% reduce 0%
> 14/01/01 09:10:05 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:10:14 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:17:32 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:17:35 INFO mapred.JobClient: Task Id :
> attempt_201401010908_0001_r_00_0, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:17:46 INFO mapred.JobClient:  map 100% reduce 11%
> 14/01/01 09:17:50 INFO mapred.JobClient:  map 100% reduce 22%
> 14/01/01 09:25:06 INFO mapred.JobClient:  map 100% reduce 0%
> 14/01/01 09:25:10 INFO mapred.JobClient: Task Id :
> attempt_201401010908_0001_r_00_1, Status : FAILED
> Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 14/01/01 09:25:34 INFO mapred.JobClient:  map 100% reduce 100%
> 14/01/01 09:25:42 INFO mapred.JobClient: Job complete:
> job_201401010908_0001
> 14/01/01 09:25:42 INFO mapred.JobClient: Counters: 29
>
>
>
> Job Tracker logs:
> 2014-01-01 09:09:59,874 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_02_0' has completed task_20140
> 1010908_0001_m_02 successfully.
> 2014-01-01 09:10:04,231 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_01_0' has completed task_20140
> 1010908_0001_m_01 successfully.
> 2014-01-01 09:17:30,527 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201401010908_0001_r_00_0: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:17:30,528 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_00_0'
> 2014-01-01 09:17:30,529 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_00_0' to ti
> p task_201401010908_0001_r_00, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:44663'
> 2014-01-01 09:17:35,130 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_00_0'
> 2014-01-01 09:17:35,213 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201401010908_0001_r_00_1' to tip task
> _201401010908_0001_r_00, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:51438'
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201401010908_0001_r_00_1: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2014-01-01 09:25:05,493 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_00_1'
> 2014-01-01 09:25:05,494 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201401010908_0001_r_00_1' to ti
> p task_201401010908_0001_r_00, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:51438'
> 2014-01-01 09:25:10,087 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201401010908_0001_r_00_1'
> 2014-01-01 09:25:10,109 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201401010908_0001_r_00_2' to tip task
> _201401010908_0001_r_00, for tracker 'tracker_master:localhost/
> 127.0.0.1:57156'
> 2014-01-01 09:25:33,340 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_r_00_2' has completed task_20140
> 1010908_0001_r_00 successfully.
> 2014-01-01 09:25:33,462 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (JOB_CLEANUP) 'attempt_201401010908_0001_m_03_0' to tip
>  task_201401010908_0001_m_03, for tracker 'tracker_master:localhost/
> 127.0.0.1:57156'
> 2014-01-01 09:25:42,304 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201401010908_0001_m_03_0' has completed task_20140
> 1010908_0001_m_03 successfully.
>
>
> On Tue, Dec 31, 2013 at 4:56 PM, Hardik Pandya wrote:
>
>> as expected, its failing during shuffle
>>
>> it seems like hdfs could not resolv

Re: Map succeeds but reduce hangs

2013-12-31 Thread Hardik Pandya
2311107_0004_m_04, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> 2013-12-31 14:27:40,876 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_04_0' has completed task_20131
> 2311107_0004_m_04 successfully.
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_00_0' to tip task_20
> 1312311107_0004_m_00, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> 2013-12-31 14:27:40,878 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_00
> 2013-12-31 14:27:40,907 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_01_0' to tip task_20
> 1312311107_0004_m_01, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:52677'
> 2013-12-31 14:27:40,908 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_01
> 2013-12-31 14:27:41,122 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (MAP) 'attempt_201312311107_0004_m_02_0' to tip task_20
> 1312311107_0004_m_02, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:46845'
> 2013-12-31 14:27:41,123 INFO org.apache.hadoop.mapred.JobInProgress:
> Choosing data-local task task_201312311107_0004_m_02
> 2013-12-31 14:27:49,659 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_02_0' has completed task_20131
> 2311107_0004_m_02 successfully.
> 2013-12-31 14:27:49,662 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201312311107_0004_r_00_0' to tip task
> _201312311107_0004_r_00, for tracker 'tracker_slave3:localhost/
> 127.0.0.1:46845'
> 2013-12-31 14:27:50,338 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_00_0' has completed task_20131
> 2311107_0004_m_00 successfully.
> 2013-12-31 14:27:51,168 INFO org.apache.hadoop.mapred.JobInProgress: Task
> 'attempt_201312311107_0004_m_01_0' has completed task_20131
> 2311107_0004_m_01 successfully.
> 2013-12-31 14:27:54,207 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201312311107_0003_r_00_0: Shuffle Error: Exc
> eeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
> 2013-12-31 14:27:54,208 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201312311107_0003_r_00_0'
> 2013-12-31 14:27:54,209 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (TASK_CLEANUP) 'attempt_201312311107_0003_r_00_0' to ti
> p task_201312311107_0003_r_00, for tracker 'tracker_slave2:localhost/
> 127.0.0.1:52677'
> 2013-12-31 14:27:58,797 INFO org.apache.hadoop.mapred.JobTracker: Removing
> task 'attempt_201312311107_0003_r_00_0'
> 2013-12-31 14:27:58,815 INFO org.apache.hadoop.mapred.JobTracker: Adding
> task (REDUCE) 'attempt_201312311107_0003_r_00_1' to tip task
> _201312311107_0003_r_00, for tracker 'tracker_slave1:localhost/
> 127.0.0.1:57492'
> hduser@pc228:/usr/local/hadoop/logs$
>
>
> I am referring the below document to configure hadoop cluster.
>
>
> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
>
> Did i miss something ? Pls guide.
>
> Thanks
> Navaz
>
>
> On Tue, Dec 31, 2013 at 3:25 PM, Hardik Pandya wrote:
>
>> what does your job log says? is yout hdfs-site configured properly to
>> find 3 data nodes? this could very well getting stuck in shuffle phase
>>
>> last thing to try : does stop-all and start-all helps? even worse try
>> formatting namenode
>>
>>
>> On Tue, Dec 31, 2013 at 11:40 AM, navaz  wrote:
>>
>>> Hi
>>>
>>>
>>> I am running Hadoop cluster with 1 name node and 3 data nodes.
>>>
>>> My HDFS looks like this.
>>>
>>> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg
>>> Warning: $HADOOP_HOME is deprecated.
>>>
>>> Found 7 items
>>> -rw-r--r--   4 hduser supergroup 343691 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg132.txt
>>> -rw-r--r--   4 hduser supergroup 594933 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg1661.txt
>>> -rw-r--r--   4 hduser supergroup1945886 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg19699.txt
>>> -rw-r--r--   4 hduser supergroup 674570 2013-12-30 19:12
>>> /user/hduser/getty/gutenberg/pg20417.txt
>>> -rw-r--r--   4 hduser supergroup 

Re: block replication

2013-12-31 Thread Hardik Pandya

  dfs.heartbeat.interval
  3
  Determines datanode heartbeat interval in
seconds.


and may be you are looking for



  *dfs.namenode.stale.datanode.interval<*/name>
  3
  
Default time interval for marking a datanode as "stale", i.e., if
the namenode has not received heartbeat msg from a datanode for
more than this time interval, the datanode will be marked and treated
as "stale" by default. The stale interval cannot be too small since
otherwise this may cause too frequent change of stale states.
We thus set a minimum stale interval value (the default value is 3
times
of heartbeat interval) and guarantee that the stale interval cannot be
less
than the minimum value.
  


On Fri, Dec 27, 2013 at 10:10 PM, Vishnu Viswanath <
vishnu.viswanat...@gmail.com> wrote:

> well i couldn't find any property in
> http://hadoop.apache.org/docs/r1.2.1/hdfs-default.html that sets the time
> interval time consider a node as dead.
>
> I saw there is a property dfs.namenode.heartbeat.recheck-interval or
> heartbeat.recheck.interval, but i couldn't find it there. is it removed?
> or am i looking at wrong place?
>
>
> On Sat, Dec 28, 2013 at 7:36 AM, Chris Embree  wrote:
>
>> Maybe I'm just grouchy tonight.. it's seems all of these questions can be
>> answered by RTFM.
>> http://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
>>
>> What's the balance between encouraging learning by New to Hadoop users
>> and OMG!?
>>
>>
>> On Fri, Dec 27, 2013 at 8:58 PM, Vishnu Viswanath <
>> vishnu.viswanat...@gmail.com> wrote:
>>
>>> Hi all,
>>>
>>> Can someone tell me these:
>>>
>>> 1) which property in hadoop conf sets the time limit to consider a node
>>> as dead?
>>> 2) after detecting a node as dead, after how much time does hadoop
>>> replicates the block to another node?
>>> 3) if the dead node comes alive again, in how much time does hadoop
>>> identifies a block as over-replicated and when does it deletes that block?
>>>
>>> Regards,
>>>
>>
>>
>


Re: Map succeeds but reduce hangs

2013-12-31 Thread Hardik Pandya
what does your job log says? is yout hdfs-site configured properly to find
3 data nodes? this could very well getting stuck in shuffle phase

last thing to try : does stop-all and start-all helps? even worse try
formatting namenode


On Tue, Dec 31, 2013 at 11:40 AM, navaz  wrote:

> Hi
>
>
> I am running Hadoop cluster with 1 name node and 3 data nodes.
>
> My HDFS looks like this.
>
> hduser@nm:/usr/local/hadoop$ hadoop fs -ls /user/hduser/getty/gutenberg
> Warning: $HADOOP_HOME is deprecated.
>
> Found 7 items
> -rw-r--r--   4 hduser supergroup 343691 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg132.txt
> -rw-r--r--   4 hduser supergroup 594933 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg1661.txt
> -rw-r--r--   4 hduser supergroup1945886 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg19699.txt
> -rw-r--r--   4 hduser supergroup 674570 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg20417.txt
> -rw-r--r--   4 hduser supergroup1573150 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg4300.txt
> -rw-r--r--   4 hduser supergroup1423803 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg5000.txt
> -rw-r--r--   4 hduser supergroup 393968 2013-12-30 19:12
> /user/hduser/getty/gutenberg/pg972.txt
> hduser@nm:/usr/local/hadoop$
>
> When i start mapreduce wordcount program it gives 100% mapping and reduce
> is hangs at 14%.
>
> hduser@nm:~$ hadoop jar chiu-wordcount2.jar WordCount
> /user/hduser/getty/gutenberg /user/hduser/getty/gutenberg_out3
> Warning: $HADOOP_HOME is deprecated.
>
> 13/12/31 09:31:07 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/31 09:31:07 INFO input.FileInputFormat: Total input paths to process
> : 7
> 13/12/31 09:31:08 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 13/12/31 09:31:08 WARN snappy.LoadSnappy: Snappy native library not loaded
> 13/12/31 09:31:08 INFO mapred.JobClient: Running job: job_201312310929_0001
> 13/12/31 09:31:09 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/31 09:31:29 INFO mapred.JobClient:  map 14% reduce 0%
> 13/12/31 09:31:34 INFO mapred.JobClient:  map 32% reduce 0%
> 13/12/31 09:31:35 INFO mapred.JobClient:  map 75% reduce 0%
> 13/12/31 09:31:36 INFO mapred.JobClient:  map 90% reduce 0%
> 13/12/31 09:31:37 INFO mapred.JobClient:  map 99% reduce 0%
> 13/12/31 09:31:38 INFO mapred.JobClient:  map 100% reduce 0%
> 13/12/31 09:31:43 INFO mapred.JobClient:  map 100% reduce 14%
>
> 
>
> Could you please help me in resolving this issue.
>
>
> Thanks & Regards
> *Abdul Navaz*
>
>
>
>


Re: Request for a pointer to a MapReduce Program tutorial

2013-12-27 Thread Hardik Pandya
I recently blogged about it - hope it helps

http://letsdobigdata.wordpress.com/2013/12/07/running-hadoop-mapreduce-application-from-eclipse-kepler/

Regards,
Hardik

On Fri, Dec 27, 2013 at 6:53 AM, Sitaraman Vilayannur <
vrsitaramanietfli...@gmail.com> wrote:

> Hi,
>  Would much appreciate a pointer to a mapreduce tutorial which explains
> how i can run a simulated cluster of mapreduce nodes on a single PC and
> write a Java program with the MapReduce Paradigm.
>  Thanks very much.
> Sitaraman
>


Re: Error starting hadoop-2.2.0

2013-12-12 Thread Hardik Pandya
do you have multiple or mixed version SLF4J jars in your classpath, how
about downgrading your SLF4J to 1.5.5 or 1.5.6?

please let me know how it works out for you, thanks

from the warning
slf4j-api version does not match that of the binding


An SLF4J binding designates an artifact such as *slf4j-jdk14.jar* or
*slf4j-log4j12.jar* used to *bind* slf4j to an underlying logging
framework, say, java.util.logging and respectively log4j.

Mixing mixing different versions of *slf4j-api.jar* and SLF4J binding can
cause problems. For example, if you are using slf4j-api-1.7.5.jar, then you
should also use slf4j-simple-1.7.5.jar, using slf4j-simple-1.5.5.jar will
not work.

NOTE From the client's perspective all versions of slf4j-api are
compatible. Client code compiled with *slf4j-api-N.jar* will run perfectly
fine with *slf4j-api-M.jar* for any N and M. You only need to ensure that
the version of your binding matches that of the slf4j-api.jar. You do not
have to worry about the version of slf4j-api.jar used by a given dependency
in your project. You can always use any version of *slf4j-api.jar*, and as
long as the version of *slf4j-api.jar* and its binding match, you should be
fine.

At initialization time, if SLF4J suspects that there may be a api vs.
binding version mismatch problem, it will emit a warning about the
suspected mismatch.


On Thu, Dec 12, 2013 at 2:52 AM, Ahmed Eldawy  wrote:

> Hi,
>  I've been using Hadoop 1.x for a few months and it was working fine. Now,
> I want to migrate to hadoop-2.x but I'm having troubles starting it. In
> Hadoop 1.x, I used to configure core-site.xml and mapred-site.xml to be
> able to start master and slave on one machine.
> In hadoop-2.2.0, I followed the instructions on
> http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/SingleCluster.html
> Whenever I start yarn or HDFS I find this error in the logs
> java.lang.NoSuchMethodError:
> org.slf4j.helpers.MessageFormatter.format(Ljava/lang/String;Ljava/lang/Object;Ljava/lang/Object;)Lorg/slf4j/helpers/FormattingTuple;
> at org.slf4j.impl.Log4jLoggerAdapter.info
> (Log4jLoggerAdapter.java:345)
> at org.mortbay.log.Slf4jLog.info(Slf4jLog.java:67)
> at org.mortbay.log.Log.(Log.java:79)
> at org.mortbay.component.Container.add(Container.java:200)
> at org.mortbay.component.Container.update(Container.java:164)
> at org.mortbay.component.Container.update(Container.java:106)
> at org.mortbay.jetty.Server.setConnectors(Server.java:160)
> at org.mortbay.jetty.Server.addConnector(Server.java:134)
> at org.apache.hadoop.http.HttpServer.(HttpServer.java:241)
> at org.apache.hadoop.http.HttpServer.(HttpServer.java:174)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:305)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:664)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:259)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1727)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1642)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1665)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1837)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1858)
> 2013-12-12 10:20:12,541 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1
>
> Also there is a warning that seems to be related.
> SLF4J: The requested version 1.6.99 by your slf4j binding is not
> compatible with [1.5.5, 1.5.6]
> SLF4J: See http://www.slf4j.org/codes.html#version_mismatch for further
> details.
>
> Any suggestions of how to fix it?
>
>
> Best regards,
> Ahmed Eldawy
>


Re: Versioninfo and platformName issue.

2013-12-11 Thread Hardik Pandya
its a classpath issue, also make sure your PATH is correct

export HIVE_HOME=/home/username/yourhivedir
$ export PATH=$HIVE_HOME/bin:$PATH


On Wed, Dec 11, 2013 at 9:37 AM, Manish  wrote:

> Adam,
>
> Here is what i get when run $ hadoop version
>
> Hadoop 2.0.0-cdh4.4.0
> Subversion file:///var/lib/jenkins/workspace/generic-package-
> ubuntu64-12-04/CDH4.4.0-Packaging-Hadoop-2013-09-03_
> 18-48-35/hadoop-2.0.0+1475-1.cdh4.4.0.p0.23~precise/src/
> hadoop-common-project/hadoop-common -r c0eba6cd38c984557e96a16ccd7356
> b7de835e79
> Compiled by jenkins on Tue Sep  3 19:33:54 PDT 2013
> From source with checksum ac7e170aa709b3ace13dc5f775487180
> This command was run using /usr/lib/hadoop/hadoop-common-
> 2.0.0-cdh4.4.0.jar
>
> Do you have specific idea what could have gone wrong with Hadoop Classpath?
>
> Thank You,
> Manish.
>
>
> On Wednesday 11 December 2013 04:51 AM, Adam Kawa wrote:
>
>> $ hadoop version
>>
>
>


Re: multiusers in hadoop through LDAP

2013-12-10 Thread Hardik Pandya
have you looked at hadoop.security.group.mapping.ldap.* in
hadoop-common/core-default.xml

additional 
resourcemay
help






On Tue, Dec 10, 2013 at 3:06 AM, YouPeng Yang wrote:

> Hi
>
>   In my cluster ,I want to have multiusers for different purpose.The usual
> method is to add a user through the OS  on  Hadoop NameNode .
>   I notice the hadoop also support to LDAP, could I add user through LDAP
> instead through OS? So that if a user is authenticated by the LDAP ,who
> will also access the HDFS directory?
>
>
> Regards
>