date:20110728

TestDFSIO error: libhdfs.so.1 does not exist

2011-07-28 Thread Yang Xiaoliang

Hi all,

I am benchmarking a Hadoop Cluster with the hadoop-*-test.jar TestDFSIO

but the following error returns:
File /usr/hadoop-0.20.2/libhdfs/libhdfs.so.1 does not exist.

How to solve this problem?

Thanks!

Re: Hadoop Question

2011-07-28 Thread George Datskos


Nitin,

On 2011/07/28 14:51, Nitin Khandelwal wrote:

How can I determine if a file is being written to (by any thread) in HDFS.
That information is exposed by the NameNode http servlet.  You can 
obtain it with the

fsck tool (hadoop fsck /path/to/dir -openforwrite) or you can do an http get

http://namenode:port/fsck?path=/your/path&openforwrite=1


George

RE: next gen map reduce

2011-07-28 Thread Aaron Baff

Does this mean 0.22.0 has reached stable and will be released as the stable 
version soon?

--Aaron

-Original Message-
From: Robert Evans [mailto:ev...@yahoo-inc.com]
Sent: Thursday, July 28, 2011 6:39 AM
To: common-user@hadoop.apache.org
Subject: Re: next gen map reduce

It has not been introduced yet.  If you are referring to MRV2.  It is targeted 
to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch.  
Which should hopefully be merged to trunk in abut a week.

--Bobby

On 7/28/11 7:31 AM, "real great.."  wrote:

In which Hadoop version is next gen introduced?

--
Regards,
R.V.

Re: File System Counters.

2011-07-28 Thread R V

Harsh

If this is the case I don't understand something. If I see FILE_BYTES_READ to 
be non zero for a map, the only thing I can assume is that it came
 from a spill during sort phase.

I have a 10 node cluster, and I ran TeraSort with size 100,000 Bytes ( 1000 
records). 

My io.sort.mb is 300 and io.sort.factor is 10. My mapred.child.java.opts is set 
to -Xmx512m.

When I run this I expected given that I have everything that fits into memory,  
that there will be no FILE_BYTES_READ on the map side and no FILE_BYTES_WRITTEN 
on the redcue side. But I find that my 
FILE_BYTES_READ on the map side is 188,604 (HDFS_BYTES_READ is 149,686) and 
inexplicably SPILLED_RECORDS is 1000 for both and map and reduce. 

So my questions have become two.
1. Why is my spill count 1000. Given that io.sort.factor and io.sort.mb are 10 
and 300 MB and I have 512MB for each task?
2.  Where are the numbers for FILE_BYTES_READ/WRITTEN coming from?

TIA

Raj
From: Harsh J 
To: common-user@hadoop.apache.org; R V 
Sent: Thursday, July 28, 2011 12:03 AM
Subject: Re: File System Counters.

Raj,

There is no overlap. Data read from HDFS FileSystem instances go to
HDFS_BYTES_READ, and data read from Local FileSystem instances go to
FILE_BYTES_READ. These are two different FileSystems, and have no
overlap at all.

On Thu, Jul 28, 2011 at 5:56 AM, R V  wrote:
> Hello
>
> I don't know if the question has been answered. I  am trying to understand 
> the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various 
> components that provide value to this counter? For example when I see 
> FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to 
> the spill during sort phase? If a HDFS read happens on a non local node, does 
> the counter increase on the node where the data block resides? What happens 
> when the data is local? does the counter increase for both HDFS_BYTES_READ 
> and FILE_BYTES_READ? From the values I am seeing, this looks to be the case 
> but I am not sure.
>
> I am not very fluent in Java , and hence I don't fully understand the source 
> . :-(
>
> Raj

-- 
Harsh J

Re: Exporting From Hive

2011-07-28 Thread Ayon Sinha

This is for CLI
  
Use this:
set hive.cli.print.header=true;

Instead of doing this on the prompt everytime you can change your hive start 
command to:
hive -hiveconf hive.cli.print.header=true

But be careful with this setting as quite a few commands stop working with NPE 
with this on. I think describe doesn't work.
 
-Ayon
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.




From: "Bale, Michael" 
To: common-user@hadoop.apache.org
Sent: Thursday, July 28, 2011 8:54 AM
Subject: Exporting From Hive

Hi,

I was wondering if anyone could help me?

Does anyone know if it is possible to include the column headers in an
output from a Hive Query? I've had a look through the internet but can't
seem to find an answer.

If not, is it possible to export the result from a describe table query? If
so I could then run that at the same tie and join up at a future date.

Thanks for your help

-- 
*Mike Bale*
Graduate Insight Analyst
*Cable and Wireless Communications*
Tel: +44 (0)20 7315 4437
www.cwc.com

The information contained in this email (and any attachments) is confidential 
and may be privileged. If you are not the intended recipient
and have received this email in error, please notify the sender immediately by 
reply email and delete the message and any attachments.
If you are not the named addressee, you must not copy, disclose, forward or 
otherwise use the information contained in this email.
Cable & Wireless Communications Plc and its affiliates reserve the right to 
monitor all email communications through their networks to
ensure regulatory compliance.

Cable & Wireless Communications Plc is a company registered in England & Wales 
with number:
07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ

Re: cygwin not connecting to Hadoop server

2011-07-28 Thread Uma Maheswara Rao G 72686

Hi A Df,

see inline at ::

- Original Message -
From: A Df 
Date: Wednesday, July 27, 2011 10:31 pm
Subject: Re: cygwin not connecting to Hadoop server
To: "common-user@hadoop.apache.org" 

> See inline at **. More questions and many Thanks :D
> 
> 
> 
> 
> >
> >From: Uma Maheswara Rao G 72686 
> >To: common-user@hadoop.apache.org; A Df 
> >Cc: "common-user@hadoop.apache.org" 
> 
> >Sent: Wednesday, 27 July 2011, 17:31
> >Subject: Re: cygwin not connecting to Hadoop server
> >
> >
> >Hi A Df,
> >
> >Did you format the NameNode first?
> >
> >** I had formatted it already but then I had reinstalled Java and 
> upgraded the plugins in cygwin so I reformatted it again. :D yes it 
> worked!! I am not sure all the steps that got it to finally work 

:: Great :-)

> but I will have to document it to prevent this headache in the 
> future. Although I typed ssh localhost too , so question is, do I 
> need to type ssh localhost each time I need to run hadoop?? Also, 

:: Actually ssh is an authentication service.
To save the athentication keys, you can run below commands. which will provide 
authentication.No need to give password every time.

ssh-keygen -t rsa -P ""
cat /root/.ssh/id_rsa.pub > /root/.ssh/authosized_keys

then exceute
/etc/init.d/sshd restart

To connect to remote machines
cat /root/.ssh/id_rsa.pub | ssh root@ 'cat > 
/root/.ssh/authorized_keys'

then in remote machine excute
/etc/init.d/sshd restart

> since I need to work with Eclipse maybe you can have a look at my 
> post about the plugin cause I can get the patch to work. The 
> subject is "Re: Cygwin not working with Hadoop and Eclipse Plugin". 
> I plan to read up on how to write programs for Hadoop. I am using 
> the tutorial at Yahoo but if you know of any really good about 
> coding with Hadoop or just about understanding Hadoop then please 
> let me know.
Hadoop Definitive guide will the great book for understanding the 
Hadoop.Some sample prgrams also will be available.
To check the Hadoop internals:
http://www.google.co.in/url?sa=t&source=web&cd=8&ved=0CEMQFjAH&url=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdf&rct=j&q=hadoop%20internals%20%2B%20part%201&ei=CqAxTtD8C4fprQfYq6DMCw&usg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQ&cad=rja


> >
> >Can you check the NN logs whether NN is started or not?
> >** I checked and the previous runs had some logs missing but now 
> the last one have all 5 logs and I got two conf files in xml. I 
> also copied out the other output files which I plan to examine. 
> Where do I specify the output extension format that I want for my 
> output file? I was hoping for an txt file it shows the output in a 
> file with no extension even though I can read it in Notepad++. I 
> also got to view the web interface at:
> >NameNode - http://localhost:50070/
> >JobTracker - http://localhost:50030/
> >
> >** See below for the working version, finally!! Thanks
> >
> >Williams@TWilliams-LTPC ~/hadoop-0.20.2
> >$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input
> >11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in
> >
> >11/07/27 17:42:20 INFO mapred.JobClient: Running job: j
> >11/07/27 17:42:21 INFO mapred.JobClient:  map 0% reduce
> >11/07/27 17:42:33 INFO mapred.JobClient:  map 15% reduc
> >11/07/27 17:42:36 INFO mapred.JobClient:  map 23% reduc
> >11/07/27 17:42:39 INFO mapred.JobClient:  map 38% reduc
> >11/07/27 17:42:42 INFO mapred.JobClient:  map 38% reduc
> >11/07/27 17:42:45 INFO mapred.JobClient:  map 53% reduc
> >11/07/27 17:42:48 INFO mapred.JobClient:  map 69% reduc
> >11/07/27 17:42:51 INFO mapred.JobClient:  map 76% reduc
> >11/07/27 17:42:54 INFO mapred.JobClient:  map 92% reduc
> >11/07/27 17:42:57 INFO mapred.JobClient:  map 100% redu
> >11/07/27 17:43:06 INFO mapred.JobClient:  map 100% redu
> >11/07/27 17:43:09 INFO mapred.JobClient: Job complete:
> >11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18
> >11/07/27 17:43:09 INFO mapred.JobClient:   Job Counters
> >11/07/27 17:43:09 INFO mapred.JobClient: Launched r
> >11/07/27 17:43:09 INFO mapred.JobClient: Launched m
> >11/07/27 17:43:09 INFO mapred.JobClient: Data-local
> >11/07/27 17:43:09 INFO mapred.JobClient:   FileSystemCo
> >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient:   Map-Reduce F
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
> >11/07/27 17:43:09 INFO mapred.JobClient: Combine ou
> >11/07/27 17:43:09 INFO mapred.JobClient: Map input
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce out
> >11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re
> >11/07/27 17:43:09 INFO mapred.JobClient: M

Re: OSX starting hadoop error

2011-07-28 Thread Bryan Keller

FYI, I logged a bug for this:
https://issues.apache.org/jira/browse/HADOOP-7489

On Jul 28, 2011, at 11:36 AM, Bryan Keller wrote:

> I am also seeing this error upon startup. I am guessing you are using OS X 
> Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to 
> function properly despite this error showing up, though it is annoying.
> 
> 
> On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote:
> 
>> All
>> When starting hadoop on OSX I am getting this error. is there a fix for it
>> 
>> java[22373:1c03] Unable to load realm info from SCDynamicStore
>

Re: OSX starting hadoop error

2011-07-28 Thread Bryan Keller

I am also seeing this error upon startup. I am guessing you are using OS X 
Lion? It started happening to me after I upgraded to 10.7. Hadoop seems to 
function properly despite this error showing up, though it is annoying.

On Jul 27, 2011, at 12:37 PM, Ben Cuthbert wrote:

> All
> When starting hadoop on OSX I am getting this error. is there a fix for it
> 
> java[22373:1c03] Unable to load realm info from SCDynamicStore

Re: cygwin not connecting to Hadoop server

2011-07-28 Thread Uma Maheswara Rao G 72686

Hi A Df,

see inline at ::

- Original Message -
From: A Df 
Date: Wednesday, July 27, 2011 10:31 pm
Subject: Re: cygwin not connecting to Hadoop server
To: "common-user@hadoop.apache.org" 

> See inline at **. More questions and many Thanks :D
> 
> 
> 
> 
> >
> >From: Uma Maheswara Rao G 72686 
> >To: common-user@hadoop.apache.org; A Df 
> >Cc: "common-user@hadoop.apache.org" 
> 
> >Sent: Wednesday, 27 July 2011, 17:31
> >Subject: Re: cygwin not connecting to Hadoop server
> >
> >
> >Hi A Df,
> >
> >Did you format the NameNode first?
> >
> >** I had formatted it already but then I had reinstalled Java and 
> upgraded the plugins in cygwin so I reformatted it again. :D yes it 
> worked!! I am not sure all the steps that got it to finally work 

:: Great :-)

> but I will have to document it to prevent this headache in the 
> future. Although I typed ssh localhost too , so question is, do I 
> need to type ssh localhost each time I need to run hadoop?? Also, 

:: Actually ssh is an authentication service.
To save the athentication keys, you can run below commands. which will provide 
authentication.No need to give password every time.

ssh-keygen -t rsa -P ""
cat /root/.ssh/id_rsa.pub > /root/.ssh/authosized_keys

then exceute
/etc/init.d/sshd restart

To connect to remote machines
cat /root/.ssh/id_rsa.pub | ssh root@ 'cat > 
/root/.ssh/authorized_keys'

then in remote machine excute
/etc/init.d/sshd restart

> since I need to work with Eclipse maybe you can have a look at my 
> post about the plugin cause I can get the patch to work. The 
> subject is "Re: Cygwin not working with Hadoop and Eclipse Plugin". 
> I plan to read up on how to write programs for Hadoop. I am using 
> the tutorial at Yahoo but if you know of any really good about 
> coding with Hadoop or just about understanding Hadoop then please 
> let me know.
Hadoop Definitive guide will the great book for understanding the 
Hadoop.Some sample prgrams also will be available.
To check the Hadoop internals:
http://www.google.co.in/url?sa=t&source=web&cd=8&ved=0CEMQFjAH&url=http%3A%2F%2Findia.paxcel.net%3A6060%2FLargeDataMatters%2Fwp-content%2Fuploads%2F2010%2F09%2FHDFS1.pdf&rct=j&q=hadoop%20internals%20%2B%20part%201&ei=CqAxTtD8C4fprQfYq6DMCw&usg=AFQjCNGYMQbAeGP0cYGl4OJHseRsfEMGvQ&cad=rja


> >
> >Can you check the NN logs whether NN is started or not?
> >** I checked and the previous runs had some logs missing but now 
> the last one have all 5 logs and I got two conf files in xml. I 
> also copied out the other output files which I plan to examine. 
> Where do I specify the output extension format that I want for my 
> output file? I was hoping for an txt file it shows the output in a 
> file with no extension even though I can read it in Notepad++. I 
> also got to view the web interface at:
> >    NameNode - http://localhost:50070/
> >    JobTracker - http://localhost:50030/
> >
> >** See below for the working version, finally!! Thanks
> >
> >Williams@TWilliams-LTPC ~/hadoop-0.20.2
> >$ bin/hadoop jar hadoop-0.20.2-examples.jar grep input
> >11/07/27 17:42:20 INFO mapred.FileInputFormat: Total in
> >
> >11/07/27 17:42:20 INFO mapred.JobClient: Running job: j
> >11/07/27 17:42:21 INFO mapred.JobClient:  map 0% reduce
> >11/07/27 17:42:33 INFO mapred.JobClient:  map 15% reduc
> >11/07/27 17:42:36 INFO mapred.JobClient:  map 23% reduc
> >11/07/27 17:42:39 INFO mapred.JobClient:  map 38% reduc
> >11/07/27 17:42:42 INFO mapred.JobClient:  map 38% reduc
> >11/07/27 17:42:45 INFO mapred.JobClient:  map 53% reduc
> >11/07/27 17:42:48 INFO mapred.JobClient:  map 69% reduc
> >11/07/27 17:42:51 INFO mapred.JobClient:  map 76% reduc
> >11/07/27 17:42:54 INFO mapred.JobClient:  map 92% reduc
> >11/07/27 17:42:57 INFO mapred.JobClient:  map 100% redu
> >11/07/27 17:43:06 INFO mapred.JobClient:  map 100% redu
> >11/07/27 17:43:09 INFO mapred.JobClient: Job complete:
> >11/07/27 17:43:09 INFO mapred.JobClient: Counters: 18
> >11/07/27 17:43:09 INFO mapred.JobClient:   Job Counters
> >11/07/27 17:43:09 INFO mapred.JobClient: Launched r
> >11/07/27 17:43:09 INFO mapred.JobClient: Launched m
> >11/07/27 17:43:09 INFO mapred.JobClient: Data-local
> >11/07/27 17:43:09 INFO mapred.JobClient:   FileSystemCo
> >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: FILE_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient: HDFS_BYTES
> >11/07/27 17:43:09 INFO mapred.JobClient:   Map-Reduce F
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce inp
> >11/07/27 17:43:09 INFO mapred.JobClient: Combine ou
> >11/07/27 17:43:09 INFO mapred.JobClient: Map input
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce shu
> >11/07/27 17:43:09 INFO mapred.JobClient: Reduce out
> >11/07/27 17:43:09 INFO mapred.JobClient: Spilled Re
> >11/07/27 17:43:09 INFO mapred.JobClient: M

Exporting From Hive

2011-07-28 Thread Bale, Michael

Hi,

I was wondering if anyone could help me?

Does anyone know if it is possible to include the column headers in an
output from a Hive Query? I've had a look through the internet but can't
seem to find an answer.

If not, is it possible to export the result from a describe table query? If
so I could then run that at the same tie and join up at a future date.

Thanks for your help

-- 
*Mike Bale*
Graduate Insight Analyst
*Cable and Wireless Communications*
Tel: +44 (0)20 7315 4437
www.cwc.com

The information contained in this email (and any attachments) is confidential 
and may be privileged. If you are not the intended recipient
and have received this email in error, please notify the sender immediately by 
reply email and delete the message and any attachments.
If you are not the named addressee, you must not copy, disclose, forward or 
otherwise use the information contained in this email.
Cable & Wireless Communications Plc and its affiliates reserve the right to 
monitor all email communications through their networks to
ensure regulatory compliance.
 
Cable & Wireless Communications Plc is a company registered in England & Wales 
with number:
07130199 and offices located at 3rd Floor, 26 Red Lion Square, London WC1R 4HQ

Unit testing strategy for map/reduce methods

2011-07-28 Thread W.P. McNeill

I've been playing with unit testing strategies for my Hadoop work. A
discussion of techniques and a link to example code here:
http://cornercases.wordpress.com/2011/07/28/unit-testing-mapreduce-with-overridden-write-methods/
.

Re: Replication and failure

2011-07-28 Thread Mohit Anchlia

On Thu, Jul 28, 2011 at 12:17 AM, Harsh J  wrote:
> Mohit,
>
> I believe Tom's book (Hadoop: The Definitive Guide) covers this
> precisely well. Perhaps others too.
>
> Replication is a best-effort sort of thing. If 2 nodes are all that is
> available, then two replicas are written and one is left to the
> replica monitor service to replicate later as possible (leading to an
> underreplicated write for the moment). The scenario (with default
> configs) would only fail if you have 0 DataNodes 'available' to write
> to.

Thanks Harsh. I think you answered my question. I thought that
replication of 3 is a must. And for that you really need atleast 4
nodes so that if one of the nodes die it can still write to 3 nodes. I
am assuming writes to replica nodes are always synchronous and not
eventually consistent.
>
> Or are you asking about what happens when a DN fails during a write operation?

I am assuming there will be some errors in this case.

>
> On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia  wrote:
>> Just trying to understand what happens if there are 3 nodes with
>> replication set to 3 and one node fails. Does it fail the writes too?
>>
>> If there is a link that I can look at will be great. I tried searching
>> but didn't see any definitive answer.
>>
>> Thanks,
>> Mohit
>>
>
>
>
> --
> Harsh J
>

Re: HBase Mapreduce cannot find Map class

2011-07-28 Thread Stack

See 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/package-summary.html#package_description
for some help.
St.Ack

On Thu, Jul 28, 2011 at 4:04 AM, air  wrote:
> -- Forwarded message --
> From: air 
> Date: 2011/7/28
> Subject: HBase Mapreduce cannot find Map class
> To: CDH Users 
>
>
> import java.io.IOException;
> import java.text.ParseException;
> import java.text.SimpleDateFormat;
> import java.util.Date;
>
> import org.apache.hadoop.conf.Configured;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.hbase.HBaseConfiguration;
> import org.apache.hadoop.hbase.client.HTable;
> import org.apache.hadoop.hbase.client.Put;
> import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
> import org.apache.hadoop.io.LongWritable;
> import org.apache.hadoop.io.Text;
> import org.apache.hadoop.mapred.JobClient;
> import org.apache.hadoop.mapred.JobConf;
> import org.apache.hadoop.mapred.MapReduceBase;
> import org.apache.hadoop.mapred.Mapper;
> import org.apache.hadoop.mapred.OutputCollector;
> import org.apache.hadoop.mapred.Reporter;
> import org.apache.hadoop.mapred.FileInputFormat;
> import org.apache.hadoop.mapred.lib.NullOutputFormat;
> import org.apache.hadoop.util.Tool;
> import org.apache.hadoop.util.ToolRunner;
>
>
> public class LoadToHBase extends Configured implements Tool{
>    public static class XMap extends MapReduceBase implements
> Mapper{
>        private JobConf conf;
>
>        @Override
>        public void configure(JobConf conf){
>            this.conf = conf;
>            try{
>                this.table = new HTable(new HBaseConfiguration(conf),
> "observations");
>            }catch(IOException e){
>                throw new RuntimeException("Failed HTable construction", e);
>            }
>        }
>
>        @Override
>        public void close() throws IOException{
>            super.close();
>            table.close();
>        }
>
>        private HTable table;
>        public void map(LongWritable key, Text value, OutputCollector
> output, Reporter reporter) throws IOException{
>            String[] valuelist = value.toString().split("\t");
>            SimpleDateFormat sdf = new  SimpleDateFormat("-MM-dd
> HH:mm:ss");
>            Date addtime = null; // 用户注册时间
>            Date ds = null;
>            Long delta_days = null;
>            String uid = valuelist[0];
>            try {
>                addtime = sdf.parse(valuelist[1]);
>            } catch (ParseException e) {
>                // TODO Auto-generated catch block
>                e.printStackTrace();
>            }
>
>            String ds_str = conf.get("load.hbase.ds", null);
>            if (ds_str != null){
>                try {
>                    ds = sdf.parse(ds_str);
>                } catch (ParseException e) {
>                    // TODO Auto-generated catch block
>                    e.printStackTrace();
>                }
>            }else{
>                ds_str = "2011-07-28";
>            }
>
>            if (addtime != null && ds != null){
>                delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
> 60 * 1000);
>            }
>
>            if (delta_days != null){
>                byte[] rowKey = uid.getBytes();
>                Put p = new Put(rowKey);
>                p.add("content".getBytes(), "attr1".getBytes(),
> delta_days.toString().getBytes());
>                table.put(p);
>            }
>        }
>    }
>    /**
>     * @param args
>     * @throws Exception
>     */
>    public static void main(String[] args) throws Exception {
>        // TODO Auto-generated method stub
>        int exitCode = ToolRunner.run(new HBaseConfiguration(), new
> LoadToHBase(), args);
>        System.exit(exitCode);
>    }
>
>    @Override
>    public int run(String[] args) throws Exception {
>        // TODO Auto-generated method stub
>        JobConf conf = new JobConf(getClass());
>        TableMapReduceUtil.addDependencyJars(conf);
>        FileInputFormat.addInputPath(conf, new Path(args[0]));
>        conf.setJobName("LoadToHBase");
>        conf.setJarByClass(getClass());
>        conf.setMapperClass(XMap.class);
>        conf.setNumReduceTasks(0);
>        conf.setOutputFormat(NullOutputFormat.class);
>        JobClient.runJob(conf);
>        return 0;
>    }
>
> }
>
> execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
> and it says:
>
> ..
> 11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
> attempt_201107261532_2625_m_04_1, Status : FAILED
> java.lang.RuntimeException: Error in configuring object
>        at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>        at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>        at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.j

Re: Class loading problem

2011-07-28 Thread John Armstrong

On Thu, 28 Jul 2011 10:05:57 -0400, "Kumar, Ranjan"
 wrote:
> I have a class to define data I am reading from a MySQL database.
> According to online tutorials I created a class called MyRecord and
> extended it from Writable, DBWritable. While running it with hadoop I
get a
> NoSuchMethodException: dataTest$MyRecord.()

Hadoop needs a noargs constructor to build the object, which it then fills
in by using readFields().  Many classes come with a default noargs
constructor, which basically defers to the noargs contructor from Object,
or another ancestor class.

HOWEVER

If you defined another constructor that takes arguments, you've implicitly
removed the default noargs constructor on your class.  You need to define
one explicitly, which Hadoop can use to build your objects.

hth

Class loading problem

2011-07-28 Thread Kumar, Ranjan

I have a class to define data I am reading from a MySQL database. According to 
online tutorials I created a class called MyRecord and extended it from 
Writable, DBWritable. While running it with hadoop I get a 
NoSuchMethodException: dataTest$MyRecord.()

I am using 0.21.0

Thanks for your help
Ranjan


--
Important Notice to Recipients:
 
The sender of this e-mail is an employee of Morgan Stanley Smith Barney LLC. If 
you have received this communication in error, please destroy all electronic 
and paper copies and notify the sender immediately. Erroneous transmission is 
not intended to waive confidentiality or privilege. Morgan Stanley Smith Barney 
reserves the right, to the extent permitted under applicable law, to monitor 
electronic communications. This message is subject to terms available at the 
following link: http://www.morganstanley.com/disclaimers/mssbemail.html. If you 
cannot access this link, please notify us by reply message and we will send the 
contents to you. By messaging with Morgan Stanley Smith Barney you consent to 
the foregoing.

RE: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Laxman

Start the namenode[set fs.default.name to hdfs://192.168.1.101:9000] and
check your netstat report [netstat -nlp] to check which port and IP it is
binding. Ideally, 9000 should be bound to 192.168.1.101. If yes, configure
the same IP in slaves as well. Otw, we may need to revisit your configs
once. 

To use the hostname, you should have hostname-IP mapping in /etc/hosts file
in master as well as slaves.

-Original Message-
From: Doan Ninh [mailto:uitnetw...@gmail.com] 
Sent: Thursday, July 28, 2011 6:45 PM
To: common-user@hadoop.apache.org
Subject: Re: Error in 9000 and 9001 port in hadoop-0.20.2

I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error
as before.
I need a help

On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal <
nitin.khandel...@germinait.com> wrote:

> Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
> Thanks,
> Nitin
>
> On 28 July 2011 17:46, Doan Ninh  wrote:
>
> > In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
> > That is the hostname of the master node.
> > But, the same error occurs
> > How can i fix it?
> >
> > On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak 
> > wrote:
> >
> > > I had issue using IP address in XML files . You can try to use host
> names
> > > in
> > > the place of IP address .
> > >
> > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> > > > On the master node (192.168.1.101), I configure fs.default.name =
> > > hdfs://
> > > > 127.0.0.1:9000. Then i configure everything on 3 other node
> > > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh"
> on
> > > the
> > > > master node
> > > > Everything is ok, but the slave can't connect to the master on 9000,
> > 9001
> > > > port.
> > > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> > > > "connection refused"
> > > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000.
> The
> > > > result is connected.
> > > > But, on the master node, i telnet to 192.168.1.101:9000 =>
> Connection
> > > > Refused
> > > >
> > > > Can somebody help me?
> > > >
> > >
> >
>
>
>
> --
>
>
> Nitin Khandelwal
>

Re: Hadoop-streaming using binary executable c program

2011-07-28 Thread Robert Evans

I am not completely sure what you are getting at.  It looks like the output of 
your c program is (And this is just a guess)  NOTE: \t stands for the tab 
character and in streaming it is used to separate the key from the value \n 
stands for carriage return and is used to separate individual records..
\t\n
\t\n
\t\n
...

And you want the output to look like
\t\n

You could use a reduce to do this, but the issue here is with the shuffle in 
between the maps and the reduces.  The Shuffle will group by the key to send to 
the reducers and then sort by the key.  So in reality your map output looks 
something like

FROM MAP 1:
\t\n
\t\n

FROM MAP 2:
\t\n
\t\n

FROM MAP 3:
\t\n
\t\n

If you send it to a single reducer (The only way to get a single file) Then the 
input to the reducer will be sorted alphabetically by the RNA, and the order of 
the input will be lost.  You can work around this by giving each line a unique 
number that is in the order you want It to be output.  But doing this would 
require you to write some code.  I would suggest that you do it with a small 
shell script after all the maps have completed to splice them together.

--
Bobby

On 7/27/11 2:55 PM, "Daniel Yehdego"  wrote:

Hi Bobby,

I just want to ask you if there is away of using a reducer or something like 
concatenation to glue my outputs from the mapper and outputs
them as a single file and segment of the predicted RNA 2D structure?

FYI: I have used a reducer NONE before:

HADOOP_HOME$ bin/hadoop jar
/data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper
./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file
/data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input
/user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output
/user/yehdego/RF-out -reducer NONE -verbose

and a sample of my output using the mapper of two different slave nodes looks 
like this :

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGC
and
[...(((...))).].
  (-13.46)

GGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU
.(((.((......)..  (-11.00)

and I want to concatenate and output them as a single predicated RNA sequence 
structure:

AUACCCGCAAAUUCACUCAAAUCUGUAAUAGGUUUGUCAUUCAAAUCUAGUGCAAAUAUUACUUUCGCCAAUUAGGUAUAAUAAUGGUAAGCGGGACAAGACUCGACAUUUGAUACACUAUUUAUCAAUGGAUGUCUUCU

[...(((...))).]..(((.((......)..

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehd...@miners.utep.edu

> From: dtyehd...@miners.utep.edu
> To: common-user@hadoop.apache.org
> Subject: RE: Hadoop-streaming using binary executable c program
> Date: Tue, 26 Jul 2011 16:23:10 +
>
>
> Good afternoon Bobby,
>
> Thanks so much, now its working excellent. And the speed is also reasonable. 
> Once again thanks u.
>
> Regards,
>
> Daniel T. Yehdego
> Computational Science Program
> University of Texas at El Paso, UTEP
> dtyehd...@miners.utep.edu
>
> > From: ev...@yahoo-inc.com
> > To: common-user@hadoop.apache.org
> > Date: Mon, 25 Jul 2011 14:47:34 -0700
> > Subject: Re: Hadoop-streaming using binary executable c program
> >
> > This is likely to be slow and it is not ideal.  The ideal would be to 
> > modify pknotsRG to be able to read from stdin, but that may not be possible.
> >
> > The shell script would probably look something like the following
> >
> > #!/bin/sh
> > rm -f temp.txt;
> > while read line
> > do
> >   echo $line >> temp.txt;
> > done
> > exec pknotsRG temp.txt;
> >
> > Place it in a file say hadoopPknotsRG  Then you probably want to run
> >
> > chmod +x hadoopPknotsRG
> >
> > After that you want to test it with
> >
> > hadoop fs -cat 
> > /user/yehdego/RNAData/RF00028_B.bpseqL3G5_seg_Centered_Method.txt | head -2 
> > | ./hadoopPknotsRG
> >
> > If that works then you can try it with Hadoop streaming
> >
> > HADOOP_HOME$ bin/hadoop jar 
> > /data/yehdego/hadoop-0.20.2/hadoop-0.20.2-streaming.jar -mapper 
> > ./hadoopPknotsRG -file /data/yehdego/hadoop-0.20.2/pknotsRG -file 
> > /data/yehdego/hadoop-0.20.2/hadoopPknotsRG -input 
> > /user/yehdego/RF00028_B.bpseqL3G5_seg_Centered_Method.txt -output 
> > /user/yehdego/RF-out -reducer NONE -verbose
> >
> > --Bobby
> >
> > On 7/25/11 3:37 PM, "Daniel Yehdego"  wrote:
> >
> >
> >
> > Good afternoon Bobby,
> >
> > Thanks, you gave me a great help in finding out what the problem was. After 
> > I put the command line you suggested me, I found out that there was a 
> > segmentation error.
> > The binary executable program pknotsRG only reads a file with a sequence in 
> > it. This means, there should be a shell script, as you have said, that will 
> > take the data coming
> > from stdin and write it to a temporary file. Any idea on how to do this job 
> > i

Re: next gen map reduce

2011-07-28 Thread Robert Evans

It has not been introduced yet.  If you are referring to MRV2.  It is targeted 
to go into the 0.23 release of Hadoop, but is currently on the MR-279 branch.  
Which should hopefully be merged to trunk in abut a week.

--Bobby

On 7/28/11 7:31 AM, "real great.."  wrote:

In which Hadoop version is next gen introduced?

--
Regards,
R.V.

Re: /tmp/hadoop-oracle/dfs/name is in an inconsistent state

2011-07-28 Thread Uma Maheswara Rao G 72686

Hi,

Before starting, you need to format the namenode.
./hdfs namenode -format

then this directories will be created.

respective configuration is 'dfs.namenode.name.dir'

default configurations will exist in hdfs-default.xml.
If you want to configure your own directory path, you can add the above 
property in hdfs-site.xml file.

Regards,
Uma Mahesh
**
 This email and its attachments contain confidential information from HUAWEI, 
which is intended only for the person or entity whose address is listed above. 
Any use of the information contained here in any way (including, but not 
limited to, total or partial disclosure, reproduction, or dissemination) by 
persons other than the intended recipient(s) is prohibited. If you receive this 
email in error, please notify the sender by phone or email immediately and 
delete it!
 
*

- Original Message -
From: "Daniel,Wu" 
Date: Thursday, July 28, 2011 6:51 pm
Subject: /tmp/hadoop-oracle/dfs/name is in an inconsistent state
To: common-user@hadoop.apache.org

> When I started hadoop, the namenode failed to startup because of 
> the following error. The strange thing is that it says/tmp/hadoop-
> oracle/dfs/name isinconsistent, but I don't think I have 
> configured it as /tmp/hadoop-oracle/dfs/name. Where should I check 
> for the related configuration?
>  2011-07-28 21:07:35,383 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: 
> org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
> /tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory 
> does not exist or is not accessible.
> 
>

/tmp/hadoop-oracle/dfs/name is in an inconsistent state

2011-07-28 Thread Daniel,Wu

When I started hadoop, the namenode failed to startup because of the following 
error. The strange thing is that it says/tmp/hadoop-oracle/dfs/name 
isinconsistent, but I don't think I have configured it as 
/tmp/hadoop-oracle/dfs/name. Where should I check for the related configuration?
  2011-07-28 21:07:35,383 ERROR 
org.apache.hadoop.hdfs.server.namenode.NameNode: 
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory 
/tmp/hadoop-oracle/dfs/name is in an inconsistent state: storage directory does 
not exist or is not accessible.

Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh

I changed fs.default.name to hdfs://192.168.1.101:9000. But, the same error
as before.
I need a help

On Thu, Jul 28, 2011 at 7:45 PM, Nitin Khandelwal <
nitin.khandel...@germinait.com> wrote:

> Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
> Thanks,
> Nitin
>
> On 28 July 2011 17:46, Doan Ninh  wrote:
>
> > In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
> > That is the hostname of the master node.
> > But, the same error occurs
> > How can i fix it?
> >
> > On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak 
> > wrote:
> >
> > > I had issue using IP address in XML files . You can try to use host
> names
> > > in
> > > the place of IP address .
> > >
> > > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh 
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> > > > On the master node (192.168.1.101), I configure fs.default.name =
> > > hdfs://
> > > > 127.0.0.1:9000. Then i configure everything on 3 other node
> > > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh"
> on
> > > the
> > > > master node
> > > > Everything is ok, but the slave can't connect to the master on 9000,
> > 9001
> > > > port.
> > > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> > > > "connection refused"
> > > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000.
> The
> > > > result is connected.
> > > > But, on the master node, i telnet to 192.168.1.101:9000 =>
> Connection
> > > > Refused
> > > >
> > > > Can somebody help me?
> > > >
> > >
> >
>
>
>
> --
>
>
> Nitin Khandelwal
>

Re: next gen map reduce

2011-07-28 Thread Thomas Graves

Its currently still on the MR279 branch -
http://svn.apache.org/viewvc/hadoop/common/branches/MR-279/.  It is planned
to be merged to trunk soon.

Tom

On 7/28/11 7:31 AM, "real great.."  wrote:

> In which Hadoop version is next gen introduced?

Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Nitin Khandelwal

Plz change ur* fs.default.name* to hdfs://192.168.1.101:9000
Thanks,
Nitin

On 28 July 2011 17:46, Doan Ninh  wrote:

> In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
> That is the hostname of the master node.
> But, the same error occurs
> How can i fix it?
>
> On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak 
> wrote:
>
> > I had issue using IP address in XML files . You can try to use host names
> > in
> > the place of IP address .
> >
> > On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh  wrote:
> >
> > > Hi,
> > >
> > > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> > > On the master node (192.168.1.101), I configure fs.default.name =
> > hdfs://
> > > 127.0.0.1:9000. Then i configure everything on 3 other node
> > > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on
> > the
> > > master node
> > > Everything is ok, but the slave can't connect to the master on 9000,
> 9001
> > > port.
> > > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> > > "connection refused"
> > > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
> > > result is connected.
> > > But, on the master node, i telnet to 192.168.1.101:9000 => Connection
> > > Refused
> > >
> > > Can somebody help me?
> > >
> >
>



-- 


Nitin Khandelwal

next gen map reduce

2011-07-28 Thread real great..

In which Hadoop version is next gen introduced?

-- 
Regards,
R.V.

Re: Hadoop Question

2011-07-28 Thread Joey Echeverria

How about having the slave write to temp file first, then move it to the file 
the master is monitoring for after they close it?

-Joey



On Jul 27, 2011, at 22:51, Nitin Khandelwal  
wrote:

> Hi All,
> 
> How can I determine if a file is being written to (by any thread) in HDFS. I
> have a continuous process on the master node, which is tracking a particular
> folder in HDFS for files to process. On the slave nodes, I am creating files
> in the same folder using the following code :
> 
> At the slave node:
> 
> import org.apache.commons.io.IOUtils;
> import org.apache.hadoop.fs.FileSystem;
> import java.io.OutputStream;
> 
> OutputStream oStream = fileSystem.create(path);
> IOUtils.write(, oStream);
> IOUtils.closeQuietly(oStream);
> 
> 
> At the master node,
> I am getting the earliest modified file in the folder. At times when I try
> reading the file, I get nothing in the file, mostly because the slave might
> be still finishing writing to the file. Is there any way, to somehow tell
> the master, that the slave is still writing to the file and to check the
> file sometime later for actual content.
> 
> Thanks,
> -- 
> 
> 
> Nitin Khandelwal

Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh

In the first time, i use *hadoop-cluster-1* for 192.168.1.101.
That is the hostname of the master node.
But, the same error occurs
How can i fix it?

On Thu, Jul 28, 2011 at 7:07 PM, madhu phatak  wrote:

> I had issue using IP address in XML files . You can try to use host names
> in
> the place of IP address .
>
> On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh  wrote:
>
> > Hi,
> >
> > I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> > On the master node (192.168.1.101), I configure fs.default.name =
> hdfs://
> > 127.0.0.1:9000. Then i configure everything on 3 other node
> > When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on
> the
> > master node
> > Everything is ok, but the slave can't connect to the master on 9000, 9001
> > port.
> > I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> > "connection refused"
> > Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
> > result is connected.
> > But, on the master node, i telnet to 192.168.1.101:9000 => Connection
> > Refused
> >
> > Can somebody help me?
> >
>

Re: Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread madhu phatak

I had issue using IP address in XML files . You can try to use host names in
the place of IP address .

On Thu, Jul 28, 2011 at 5:22 PM, Doan Ninh  wrote:

> Hi,
>
> I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
> On the master node (192.168.1.101), I configure fs.default.name = hdfs://
> 127.0.0.1:9000. Then i configure everything on 3 other node
> When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on the
> master node
> Everything is ok, but the slave can't connect to the master on 9000, 9001
> port.
> I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
> "connection refused"
> Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
> result is connected.
> But, on the master node, i telnet to 192.168.1.101:9000 => Connection
> Refused
>
> Can somebody help me?
>

Error in 9000 and 9001 port in hadoop-0.20.2

2011-07-28 Thread Doan Ninh

Hi,

I run Hadoop in 4 Ubuntu 11.04 on VirtualBox.
On the master node (192.168.1.101), I configure fs.default.name = hdfs://
127.0.0.1:9000. Then i configure everything on 3 other node
When i start the cluster by entering "$HADOOP_HOME/bin/start-all.sh" on the
master node
Everything is ok, but the slave can't connect to the master on 9000, 9001
port.
I manually telnet to 192.168.1.101 in 9000, 9001. And the result is
"connection refused"
Then, i'm on the master node, telnet to localhost, 127.0.0.1:9000. The
result is connected.
But, on the master node, i telnet to 192.168.1.101:9000 => Connection
Refused

Can somebody help me?

RE: Reader/Writer problem in HDFS

2011-07-28 Thread Laxman

No such API as per my knowledge.
copyFromLocal is such API. That may not fit in your scenario I guess.

--Laxman

-Original Message-
From: Meghana [mailto:meghana.mara...@germinait.com] 
Sent: Thursday, July 28, 2011 4:32 PM
To: hdfs-u...@hadoop.apache.org; lakshman...@huawei.com
Cc: common-user@hadoop.apache.org
Subject: Re: Reader/Writer problem in HDFS

Thanks Laxman! That would definitely help things. :)

Is there a better FileSystem/other method call to create a file in one go
(i.e. atomic i guess?), without having to call create() and then write to
the stream?

..meghana


On 28 July 2011 16:12, Laxman  wrote:

> One approach can be use some ".tmp" extension while writing. Once the
write
> is completed rename back to original file name. Also, reducer has to
filter
> out ".tmp" files.
>
> This will ensure reducer will not pickup the partial files.
>
> We do have the similar scenario where the a/m approach resolved the issue.
>
> -Original Message-
> From: Meghana [mailto:meghana.mara...@germinait.com]
> Sent: Thursday, July 28, 2011 1:38 PM
> To: common-user; hdfs-u...@hadoop.apache.org
> Subject: Reader/Writer problem in HDFS
>
> Hi,
>
> We have a job where the map tasks are given the path to an output folder.
> Each map task writes a single file to that folder. There is no reduce
> phase.
> There is another thread, which constantly looks for new files in the
output
> folder. If found, it persists the contents to index, and deletes the file.
>
> We use this code in the map task:
> try {
>OutputStream oStream = fileSystem.create(path);
>IOUtils.write("xyz", oStream);
> } finally {
>IOUtils.closeQuietly(oStream);
> }
>
> The problem: Some times the reader thread sees & tries to read a file
which
> is not yet fully written to HDFS (or the checksum is not written yet,
etc),
> and throws an error. Is it possible to write an HDFS file in such a way
> that
> it won't be visible until it is fully written?
>
> We use Hadoop 0.20.203.
>
> Thanks,
>
> Meghana
>
>

Fwd: HBase Mapreduce cannot find Map class

2011-07-28 Thread air

-- Forwarded message --
From: air 
Date: 2011/7/28
Subject: HBase Mapreduce cannot find Map class
To: CDH Users 


import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.HTable;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.mapred.TableMapReduceUtil;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.lib.NullOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;


public class LoadToHBase extends Configured implements Tool{
public static class XMap extends MapReduceBase implements
Mapper{
private JobConf conf;

@Override
public void configure(JobConf conf){
this.conf = conf;
try{
this.table = new HTable(new HBaseConfiguration(conf),
"observations");
}catch(IOException e){
throw new RuntimeException("Failed HTable construction", e);
}
}

@Override
public void close() throws IOException{
super.close();
table.close();
}

private HTable table;
public void map(LongWritable key, Text value, OutputCollector
output, Reporter reporter) throws IOException{
String[] valuelist = value.toString().split("\t");
SimpleDateFormat sdf = new  SimpleDateFormat("-MM-dd
HH:mm:ss");
Date addtime = null; // 用户注册时间
Date ds = null;
Long delta_days = null;
String uid = valuelist[0];
try {
addtime = sdf.parse(valuelist[1]);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

String ds_str = conf.get("load.hbase.ds", null);
if (ds_str != null){
try {
ds = sdf.parse(ds_str);
} catch (ParseException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}else{
ds_str = "2011-07-28";
}

if (addtime != null && ds != null){
delta_days = (ds.getTime() - addtime.getTime()) / (24 * 60 *
60 * 1000);
}

if (delta_days != null){
byte[] rowKey = uid.getBytes();
Put p = new Put(rowKey);
p.add("content".getBytes(), "attr1".getBytes(),
delta_days.toString().getBytes());
table.put(p);
}
}
}
/**
 * @param args
 * @throws Exception
 */
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
int exitCode = ToolRunner.run(new HBaseConfiguration(), new
LoadToHBase(), args);
System.exit(exitCode);
}

@Override
public int run(String[] args) throws Exception {
// TODO Auto-generated method stub
JobConf conf = new JobConf(getClass());
TableMapReduceUtil.addDependencyJars(conf);
FileInputFormat.addInputPath(conf, new Path(args[0]));
conf.setJobName("LoadToHBase");
conf.setJarByClass(getClass());
conf.setMapperClass(XMap.class);
conf.setNumReduceTasks(0);
conf.setOutputFormat(NullOutputFormat.class);
JobClient.runJob(conf);
return 0;
}

}

execute it using hbase LoadToHBase /user/hive/warehouse/datamining.db/xxx/
and it says:

..
11/07/28 17:20:29 INFO mapred.JobClient: Task Id :
attempt_201107261532_2625_m_04_1, Status : FAILED
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
C

Re: Reader/Writer problem in HDFS

2011-07-28 Thread Meghana

Thanks Laxman! That would definitely help things. :)

Is there a better FileSystem/other method call to create a file in one go
(i.e. atomic i guess?), without having to call create() and then write to
the stream?

..meghana


On 28 July 2011 16:12, Laxman  wrote:

> One approach can be use some ".tmp" extension while writing. Once the write
> is completed rename back to original file name. Also, reducer has to filter
> out ".tmp" files.
>
> This will ensure reducer will not pickup the partial files.
>
> We do have the similar scenario where the a/m approach resolved the issue.
>
> -Original Message-
> From: Meghana [mailto:meghana.mara...@germinait.com]
> Sent: Thursday, July 28, 2011 1:38 PM
> To: common-user; hdfs-u...@hadoop.apache.org
> Subject: Reader/Writer problem in HDFS
>
> Hi,
>
> We have a job where the map tasks are given the path to an output folder.
> Each map task writes a single file to that folder. There is no reduce
> phase.
> There is another thread, which constantly looks for new files in the output
> folder. If found, it persists the contents to index, and deletes the file.
>
> We use this code in the map task:
> try {
>OutputStream oStream = fileSystem.create(path);
>IOUtils.write("xyz", oStream);
> } finally {
>IOUtils.closeQuietly(oStream);
> }
>
> The problem: Some times the reader thread sees & tries to read a file which
> is not yet fully written to HDFS (or the checksum is not written yet, etc),
> and throws an error. Is it possible to write an HDFS file in such a way
> that
> it won't be visible until it is fully written?
>
> We use Hadoop 0.20.203.
>
> Thanks,
>
> Meghana
>
>

RE: Reader/Writer problem in HDFS

2011-07-28 Thread Laxman

One approach can be use some ".tmp" extension while writing. Once the write
is completed rename back to original file name. Also, reducer has to filter
out ".tmp" files.

This will ensure reducer will not pickup the partial files.

We do have the similar scenario where the a/m approach resolved the issue.

-Original Message-
From: Meghana [mailto:meghana.mara...@germinait.com] 
Sent: Thursday, July 28, 2011 1:38 PM
To: common-user; hdfs-u...@hadoop.apache.org
Subject: Reader/Writer problem in HDFS

Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write("xyz", oStream);
} finally {
IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees & tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana

Hadoop output contains __temporary

2011-07-28 Thread 刘鎏

Hi, all

In my recent work in hadoop, I find that the output dir contains:
both _SUCCESS and __temporary. And then the next job would be failed because
the input path contains _temporary. How does this happen? And How to avoid
this?

Thanks for your replies.


liuliu
--

Why hadoop 0.20.203 unit test failed

2011-07-28 Thread Yu Li

Hi all,

I'm trying to compile and unit testing hadoop 0.20.203, but met with almost
the same problem with previous discussion in the mailing list(
http://mail-archives.apache.org/mod_mbox/hadoop-general/201105.mbox/%3CBANLkTim68H=8ngbfzmsvrqob9pmy7fv...@mail.gmail.com%3E).
Even after setting umask to 022, I still have 11 testcases failed, as listed
below.

Test org.apache.hadoop.mapred.TestJobTrackerRestart FAILED
Test org.apache.hadoop.mapred.TestJobTrackerRestartWithLostTracker FAILED
Test org.apache.hadoop.mapred.TestJobTrackerSafeMode FAILED
Test org.apache.hadoop.filecache.TestMRWithDistributedCache FAILED
Test org.apache.hadoop.filecache.TestTrackerDistributedCacheManager FAILED
Test org.apache.hadoop.mapred.TestMiniMRMapRedDebugScript FAILED
Test org.apache.hadoop.mapred.TestRecoveryManager FAILED
Test org.apache.hadoop.mapred.TestTaskTrackerLocalization FAILED
Test org.apache.hadoop.mapred.lib.TestCombineFileInputFormat FAILED
Test org.apache.hadoop.metrics2.impl.TestMetricsSystemImpl FAILED
Test org.apache.hadoop.tools.rumen.TestRumenJobTraces FAILED
Test org.apache.hadoop.hdfsproxy.TestHdfsProxy FAILED

The jdk version in my testing environment is Sun jdk 1.6u19, and the ant
version is 1.8.2

Anybody knows what causes these testcase failure? Any comments/suggestion
would be highly appreciated.

-- 
Best Regards,
Li Yu

Reader/Writer problem in HDFS

2011-07-28 Thread Meghana

Hi,

We have a job where the map tasks are given the path to an output folder.
Each map task writes a single file to that folder. There is no reduce phase.
There is another thread, which constantly looks for new files in the output
folder. If found, it persists the contents to index, and deletes the file.

We use this code in the map task:
try {
OutputStream oStream = fileSystem.create(path);
IOUtils.write("xyz", oStream);
} finally {
IOUtils.closeQuietly(oStream);
}

The problem: Some times the reader thread sees & tries to read a file which
is not yet fully written to HDFS (or the checksum is not written yet, etc),
and throws an error. Is it possible to write an HDFS file in such a way that
it won't be visible until it is fully written?

We use Hadoop 0.20.203.

Thanks,

Meghana

Re: Replication and failure

2011-07-28 Thread Harsh J

Mohit,

I believe Tom's book (Hadoop: The Definitive Guide) covers this
precisely well. Perhaps others too.

Replication is a best-effort sort of thing. If 2 nodes are all that is
available, then two replicas are written and one is left to the
replica monitor service to replicate later as possible (leading to an
underreplicated write for the moment). The scenario (with default
configs) would only fail if you have 0 DataNodes 'available' to write
to.

Or are you asking about what happens when a DN fails during a write operation?

On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia  wrote:
> Just trying to understand what happens if there are 3 nodes with
> replication set to 3 and one node fails. Does it fail the writes too?
>
> If there is a link that I can look at will be great. I tried searching
> but didn't see any definitive answer.
>
> Thanks,
> Mohit
>

-- 
Harsh J

RE: where to find the log info

2011-07-28 Thread Devaraj K

Daniel, You can find those std out statements in  "{LOG
Directory}/userlogs/{task attemp id}/stdout" file.

Same way you can find std err statements in "{LOG Directory}/userlogs/{task
attemp id}/stderr" and log statements in "{LOG Directory}/userlogs/{task
attemp id}/syslog".

Devaraj K 

-Original Message-
From: Daniel,Wu [mailto:hadoop...@163.com] 
Sent: Thursday, July 28, 2011 11:47 AM
To: common-user@hadoop.apache.org
Subject: where to find the log info

Hi everyone,

I am new to it, and want to do some debug/log. I'd like to check what the
value is for each mapper execution. If I add the following code in bold,
where can I find the log info? If I can't do it in this way, how should I
do?

 public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  System.out.println(value.toString);
  while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
  }
}
  }

Re: File System Counters.

2011-07-28 Thread Harsh J

Raj,

There is no overlap. Data read from HDFS FileSystem instances go to
HDFS_BYTES_READ, and data read from Local FileSystem instances go to
FILE_BYTES_READ. These are two different FileSystems, and have no
overlap at all.

On Thu, Jul 28, 2011 at 5:56 AM, R V  wrote:
> Hello
>
> I don't know if the question has been answered. I  am trying to understand 
> the overlap between FILE_BYTES_READ and HDFS_BYTES_READ. What are the various 
> components that provide value to this counter? For example when I see 
> FILE_BYTES_READ for a specific task ( Map or Reduce ) , is it purely due to 
> the spill during sort phase? If a HDFS read happens on a non local node, does 
> the counter increase on the node where the data block resides? What happens 
> when the data is local? does the counter increase for both HDFS_BYTES_READ 
> and FILE_BYTES_READ? From the values I am seeing, this looks to be the case 
> but I am not sure.
>
> I am not very fluent in Java , and hence I don't fully understand the source 
> . :-(
>
> Raj



-- 
Harsh J

Re: where to find the log info

2011-07-28 Thread Harsh J

Task logs are written to userlogs directory on the TT nodes. You can
view task logs on the JobTracker/TaskTracker web UI for each task at:

http://machine:50030/taskdetails.jsp?jobid=&tipid=

All of syslogs, stdout and stderr logs are available in the links to
logs off that page.

2011/7/28 Daniel,Wu :
> Hi everyone,
>
> I am new to it, and want to do some debug/log. I'd like to check what the 
> value is for each mapper execution. If I add the following code in bold, 
> where can I find the log info? If I can't do it in this way, how should I do?
>
>     public void map(Object key, Text value, Context context
>                    ) throws IOException, InterruptedException {
>      StringTokenizer itr = new StringTokenizer(value.toString());
>      System.out.println(value.toString);
>      while (itr.hasMoreTokens()) {
>        word.set(itr.nextToken());
>        context.write(word, one);
>      }
>    }
>  }



-- 
Harsh J

39 matches

Mail list logo