Query on Cost estimates on Hadoop and Java

2013-04-23 Thread Sandeep Jain
Dear Hadoopers,

As per my knowledge Hadoop is free but we need to have Java 1.6 installed in 
our system to support Hadoop.
Can you please let me know what all the Licensing cost involved with Java and 
Hadoop to work with it.

Your inputs are valuable for us to progress.

Thanks !
Sincere Regards,
Sandeep Jain
M - +91 96633-75072


 CAUTION - Disclaimer *
This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely
for the use of the addressee(s). If you are not the intended recipient, please
notify the sender by e-mail and delete the original message. Further, you are 
not
to copy, disclose, or distribute this e-mail or its contents to any other 
person and
any such actions are unlawful. This e-mail may contain viruses. Infosys has 
taken
every reasonable precaution to minimize this risk, but is not liable for any 
damage
you may sustain as a result of any virus in this e-mail. You should carry out 
your
own virus checks before opening the e-mail or attachment. Infosys reserves the
right to monitor and review the content of all messages sent to or from this 
e-mail
address. Messages sent to or from this e-mail address may be stored on the
Infosys e-mail system.
***INFOSYS End of Disclaimer INFOSYS***


Fwd: Multiple ways to write Hadoop program driver - Which one to choose?

2013-04-23 Thread Chandrashekhar Kotekar
Hi,


I have observed that there are multiple ways to write driver method of
Hadoop program.

Following method is given in Hadoop Tutorial by
Yahoo

 public void run(String inputPath, String outputPath) throws Exception {
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(MapClass.class);
conf.setReducerClass(Reduce.class);

FileInputFormat.addInputPath(conf, new Path(inputPath));
FileOutputFormat.setOutputPath(conf, new Path(outputPath));

JobClient.runJob(conf);
  }

and this method is given in Hadoop The Definitive Guide 2012 book by
Oreilly.

public static void main(String[] args) throws Exception {
  if (args.length != 2) {
System.err.println("Usage: MaxTemperature  ");
System.exit(-1);
  }
  Job job = new Job();
  job.setJarByClass(MaxTemperature.class);
  job.setJobName("Max temperature");
  FileInputFormat.addInputPath(job, new Path(args[0]));
  FileOutputFormat.setOutputPath(job, new Path(args[1]));
  job.setMapperClass(MaxTemperatureMapper.class);
  job.setReducerClass(MaxTemperatureReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  System.exit(job.waitForCompletion(true) ? 0 : 1);
}

While trying program given in Oreilly book I found that constructors
of Job class
are deprecated. As Oreilly book is based on Hadoop 2 (yarn) I was surprised
to see that they have used deprecated class.

I would like to know which method everyone uses?







Regards,
Chandrash3khar K0tekar
Mobile - 8884631122


Re: Hadoop sampler related query!

2013-04-23 Thread Mahesh Balija
Hi Rahul,

 The limitation to use InputSampler is, the K and OK (I mean
Map INKEY and OUTKEY) both should be of same type.
 Technically because, while collecting the samples (ie.,
arraylist of keys) in writePartitionFile method it uses the INKEY as the
key. And for writing the partition file it uses Mapper OutputKEY as the
KEY.

 Logically also this is the expected behavior of sampling
because, while collecting the samples the only source is the input splits
(INKEY) from which it collects the samples and for generating partition
file you need to generate based on the Mapper outkey type.

Best,
Mahesh Balija,
CalsoftLabs.


On Tue, Apr 23, 2013 at 4:12 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:

> + mapred dev
>
>
> On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
> rahul.rec@gmail.com> wrote:
>
>> Hi,
>>
>> I have a question related to Hadoop's input sampler ,which is used for
>> investigating the data set before hand using random selection , sampling
>> etc .Mainly used for total sort , used in pig's skewed join implementation
>> as well.
>>
>> The question here is -
>>
>> Mapper
>>
>> K and V are input key and value of the mapper .Essentially coming in from
>> the input format. OK and OV are output key and value emitted from the
>> mapper.
>>
>> Looking at the input sample's code ,it looks like it is creating the
>> partition based on the input key of the mapper.
>>
>> I think the partitions should be created considering the output key (OK)
>> and the output key sort comparator should be used for sorting the samples.
>>
>> If partitioning is done based on input key and the mapper emits a
>> different key then the total sort wouldn't hold any good.
>>
>> Is there is any condition that input sample is to be only used for
>> mapper?
>>
>>
>> Thanks,
>> Rahul
>>
>>
>


Pipes Property Not Passed In

2013-04-23 Thread Xun TANG
I implemented my own InputFormat/RecordReader, and I try to run it with
Hadoop Pipes. I understand I could pass in properties to Pipes program by
either:


hadoop.pipes.java.recordreader
false


or alterntively "-D hadoop.pipes.java.recordreader=false".

However, when I ran with above configuration and with my record reader, I
got error
Hadoop Pipes Exception: RecordReader defined when not needed. at
impl/HadoopPipes.cc:798 in virtual void
HadoopPipes::TaskContextImpl::runMap(std::string, int, bool)

It pipes did not seem to pick up my setting of
hadoop.pipes.java.recordreader as false.

I've tried using conf.xml or putting -D or the combine of both. None
worked. I've googled the whole day but could not find the reason. Did I
miss something here?

I am using hadoop-1.0.4.

Here is my conf.xml



  
hadoop.pipes.executable
bin/cpc
  
  
hadoop.pipes.java.recordreader
false
  
  
hadoop.pipes.java.recordwriter
true
  


Here is the command

$HADOOP pipes \
-conf $CONF \
-files 0 \
-libjars $HADOOP_HOME/build/hadoop-test-1.0.4-SNAPSHOT.jar \
-input $IN \
-output $OUT \
-program bin/$NAME \
-reduces 0 -reduce org.apache.hadoop.mapred.lib.IdentityReducer \
-inputformat org.apache.hadoop.mapred.pipes.WordCountInputFormat

where $CONF is full path to conf.xml

I could provide more info if that hellps to determine the reason.


Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

2013-04-23 Thread Vikas Jadhav
Thanks for reply.

Will try to implement. I think there is problem in my case where i have
modified write function of mapper context.write and tried to write same key
value pair multiple times.Also for this purpose i have modified partitioner
class. my partitioner class doesnt return single value it return list of
values array integer which contain to which partition i should write key
value  pairs.




On Tue, Apr 23, 2013 at 1:15 PM, Sofia Georgiakaki
wrote:

> Hello,
>
> Sorting is done by the SortingComparator which performs sorting based on
> the value of key. A possible solution would be the following:
> You could write a custom Writable comparable class which extends
> WritableComparable (lets call it MyCompositeFieldWritableComparable), that
> will store your current key and the part of the value that you want your
> sorting to be based on. As I understand from your description, this
> writable class will have 2 IntWritable fields, e.g
> (FieldA, fieldB)
> (0,4)
> (1,1)
> (2,0)
> Implement the methods equals, sort, hashCode, etc in your custom writable
> to override the defaults. Sorting before the reduce phase will be performed
> based on the compareTo() implementation of your custom writable, so you can
> write it in a way that will compare only fieldB.
> Be careful in the way you will implement methods
> MyCompositeFieldWritableComparable.equals() -it will be used to group  list(values)> in the reducer-,
> MyCompositeFieldWritableComparable.compareTo() and
> MyCompositeFieldWritableComparable.hashCode().
> So your new KEY class will be MyCompositeFieldWritableComparable.
> As an alternative and cleaner implementation, write the
> MyCompositeFieldWritableComparable class and also a
> HashOnOneFieldPartitioner class (which extends Partitioner) that will do
> something like this:
>
> @Override
> public int getPartition(K key, V value,
>   int numReduceTasks) {
> if (key instanceof MyCompositeFieldWritableComparable)
>  return ( ((MyCompositeFieldWritableComparable)
> key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
> else
> return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
>   }
>
>
>
> You can also find related articles in the web, eg
> http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/
> .
>
> Have a nice day,
> Sofia
>
>   --
>  *From:* Vikas Jadhav 
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 23, 2013 8:44 AM
> *Subject:* Sorting Values sent to reducer NOT based on KEY (Depending on
> part of VALUE)
>
> Hi
>
> how to sort value in hadoop using standard sorting algorithm of hadoop (
> i.e sorting facility provided by hadoop)
>
> Requirement:
>
> 1) Values shoulde be sorted depending on some part of value
>
> For Exam (KEY,VALUE)
>
>  (0,"BC,4,XY')
>  (1,"DC,1,PQ")
>  (2,"EF,0,MN")
>
> Sorted sequence @ reduce reached should be
>
> (2,"EF,0,MN")
> (1,"DC,1,PQ")
> (0,"BC,4,XY')
>
> Here sorted depending on second attribute postion in value.
>
> Thanks
>
>
>
> -- **
> *
>
>   Regards,
> *
> *   Vikas *
>
>
>


-- 
*
*
*

  Regards,*
*   Vikas *


namenode memory test

2013-04-23 Thread 自己
Hi, I would like to know  how much memory our data take on the name-node per 
block, file and directory. 
For example, the metadata size of a file.
When I store some files in HDFS,how can I get the memory size take on the 
name-node?
Is there some tools or commands to test the memory size take on the name-node?


I'm looking forward to your reply! Thanks!

How to make the setting changes works

2013-04-23 Thread Geelong Yao
Hi

Sorry to interrupt you.But nobody answer my question in Hadoop maillist.
I have met a issue after I change the content of hdfs-site.xml to add
another dfs.data.dir in my cluster. /usr/hadoop/tmp/dfs/data is the default
value, /sda is the new one


data.dfs.dir
/usr/hadoop/tmp/dfs/data,/sda

The permission of /sda has been changed.

[image: 内嵌图片 1]

First I stop the cluster with stop-all.sh and replace the old hdfs-site.xml
with new ones,then I restart the cluster with start-all.sh,but I found that
datanode did not start

-- 
>From Good To Great
<>

Re: Import with Sqoop

2013-04-23 Thread Kathleen Ting
Hi Kevin,

1. What's the output from hadoop fs -cat CatalogProducts/part-m-0

2. Can you re-run with the --verbose option - i.e.

sqoop import --connect
'jdbc:sqlserver://nbreports:1433;databaseName=productcatalog'
--username  --password  --table CatalogProducts
--verbose

If you're comfortable doing so, it would be helpful for me to see the
entire Sqoop console output - please send it to me off-list if you're
happy to do so.

Regards, Kate

On Tue, Apr 23, 2013 at 4:06 PM, Kevin Burton  wrote:
> I execute the line:
>
>
>
> sqoop import --connect
> 'jdbc:sqlserver://nbreports:1433;databaseName=productcatalog' --username
>  --password  --table CatalogProducts
>
>
>
> And I get the following output:
>
>
>
> Warning: /usr/lib/hbase does not exist! HBase imports will fail.
>
> Please set $HBASE_HOME to the root of your HBase installation.
>
> 13/04/23 18:00:36 WARN tool.BaseSqoopTool: Setting your password on the
> command-line is insecure. Consider using -P instead.
>
> 13/04/23 18:00:36 INFO manager.SqlManager: Using default fetchSize of 1000
>
> . . . . .
>
> 
>
>
>
> But it doesn’t seem to do anything. I executed ‘hadoop fs –ls’ and I didn’t
> see anything. Any ideas what I have done wrong?
>
>
>
> Kevin


Import with Sqoop

2013-04-23 Thread Kevin Burton
I execute the line:

 

sqoop import --connect
'jdbc:sqlserver://nbreports:1433;databaseName=productcatalog' --username
 --password  --table CatalogProducts

 

And I get the following output:

 

Warning: /usr/lib/hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

13/04/23 18:00:36 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.

13/04/23 18:00:36 INFO manager.SqlManager: Using default fetchSize of 1000

. . . . .



 

But it doesn't seem to do anything. I executed 'hadoop fs -ls' and I didn't
see anything. Any ideas what I have done wrong?

 

Kevin



CfP 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-04-23 Thread MHPC 2013
we apologize if you receive multiple copies of this message
===

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

May 31, 2013 - LNCS Full paper submission (rolling abstract submission)
June 28, 2013 - Lightning Talk abstracts


SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.


TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive, Pig,
Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
May 31, 2013 - Full paper submission
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date


TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, USA
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom,  Indiana University, USA
Marcus Hardt,  Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung,  Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory
Viral Shah, MIT Jul

Re: Hadoop MapReduce

2013-04-23 Thread Daryn Sharp
MR has a "local mode" that does what you want.  Pig has the ability to use this 
mode.  I did a quick search but didn't immediately find a good link to 
documentation, but hopefully this gets you going in the right direction.

Daryn

On Apr 22, 2013, at 6:01 PM, David Gkogkritsiani wrote:

Helllo,

I have undertaken my diploma thesis on Hadoop MapReduce and I have been 
requested to I do an application written in MapReduce.
I found on internet this code and I ran the code :

http://paste.ubuntu.com/5591999/

How can I add the code to stores the pages somewhere locally (text only, not 
Images) and then have to be processed . ie,I should a Mapreduce code, which 
would download pages from the web and store on the local file system and not 
the HDFS.
After ,I run the quest (program) in order to not depend on network speed.

Because ,my network is so slow.

I do this to improvement performance.

I am running Hadoop Version 0.20.2 .
I am new to Hadoop and am kinda lost and any help would be greatly appreciated.

Thanks in advance for any assistance !



Re: Sqoop installation?

2013-04-23 Thread Chris Nauroth
Transferring to user list (hdfs-dev bcc'd).

Hi Kevin,

The datanodes are definitely more disposable than the namenodes.  If a
Sqoop command unexpectedly consumes a lot of resources, then stealing
resources from the namenode could impact performance of the whole cluster.
 Stealing resources from a datanode would just impact performance of tasks
scheduled to that node, and if things got really bad, then the datanode
would get blacklisted and tasks would get scheduled elsewhere anyway.

If you have hardware to spare, then you may want to consider reserving a
machine for data staging and ad-hoc commands.  This would be similar to the
other nodes in terms of Hadoop software installation and configuration, but
it wouldn't run any of the daemons.  The benefit is that you can isolate
some of these data staging and ad-hoc commands so that there is no risk of
harming nodes participating in the cluster.  The drawback is that you end
up with idle capacity when you're not running these commands, and that
capacity could have been used for map and reduce tasks.

There are some trade-offs to consider, but my biggest recommendation is
don't use the namenode.  :-)

Hope this helps,
--Chris



On Tue, Apr 23, 2013 at 8:21 AM, Kevin Burton wrote:

> The Apache documentation on installing a Sqoop server indicates:
>
>
>
> "Copy Sqoop artifact on machine where you want to run Sqoop server. This
> machine must have installed and configured Hadoop. You don't need to run
> any
> Hadoop related services there, however the machine must be able to act as
> an
> Hadoop client. "
>
>
>
> I have a NameNode Server and a bunch of DataNodes as well as a backup
> namenode server. Anyone of these servers could function as a Sqoop server
> but based on some heresay it is fair to say that some of these machines may
> have more compute cycles available in a production environment that others.
> Any recommendations as to which machine in a Hadoop cluster would best be
> able to meet the needs of a Sqoop server?
>
>
>
> Thank you.
>
>
>
> Kevin Burton
>
>


Re: Uploading file to HDFS

2013-04-23 Thread shashwat shriparv
On Tue, Apr 23, 2013 at 9:23 PM, Mohammad Tariq  wrote:

> What should I do on namenode and datanode? Thank you very much


As Tariq has ask, can you provide datanode logs snapshots??

*Thanks & Regards*

∞
Shashwat Shriparv


Re: Uploading file to HDFS

2013-04-23 Thread Mohammad Tariq
Hi there,

Could you plz show me your config files and DN error logs?

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 23, 2013 at 4:35 PM, 超级塞亚人  wrote:

> Asking for help! I'm facing the problem that no datanode to stop. Namenode
> has been started but datanode can't be started. What should I do on
> namenode and datanode? Thank you very much
>
>
> 2013/4/19 超级塞亚人 
>
>> I have a problem. Our cluster has 32 nodes. Each disk is 1TB. I wanna
>> upload 2TB file to HDFS.How can I put the file to the namenode and upload
>> to HDFS?
>
>
>


Re: Unsubscribe

2013-04-23 Thread Mohammad Tariq
You need to send the request to this address :
user-unsubscr...@hadoop.apache.org

Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 23, 2013 at 7:14 PM, neeraj.maha...@absicorp.com <
neeraj.maha...@absicorp.com> wrote:

>
> Regards,
> Neeraj Mahajan
>
>
> Disclaimer
> This communication (including the attached document(s), if a document is
> attached hereto) contains information that is proprietary and confidential
> to ABSi, which shall not be disclosed or disseminated outside of ABSi
> except in connection with ABSi business activities, or duplicated or used
> in any manner whatsoever that is not related to or in furtherance of ABSi
> business activities. Any use or disclosure in whole or in part of this
> information that is inconsistent with the foregoing sentence or without the
> explicit written permission of ABSi is strictly prohibited.
> © Copyright, ABSi . All rights reserved
>
>
>


Re: Job launch from eclipse

2013-04-23 Thread Mohammad Tariq
Hell Han,

  The reason behind this is that the jobs are running inside the
Eclipse itself and not getting submitted to your cluster. Please see if
this links helps :
http://cloudfront.blogspot.in/2013/03/mapreduce-jobs-running-through-eclipse.html#.UXaQsDWH6IQ


Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com


On Tue, Apr 23, 2013 at 6:56 PM, shashwat shriparv <
dwivedishash...@gmail.com> wrote:

> You need to generate a jar file, pass all the parameters on run time if
> any is fixed and run at hadoop like hadoop -jar jarfilename.jar 
>
> *Thanks & Regards*
>
> ∞
> Shashwat Shriparv
>
>
>
> On Tue, Apr 23, 2013 at 6:51 PM, Han JU  wrote:
>
>> Hi,
>>
>> I'm getting my hands on hadoop. One thing I really want to know is how
>> you launch MR jobs in a development environment.
>>
>> I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2.
>> With this plugin I can manage HDFS and submit job to cluster. But the
>> strange thing is, every job launch from Eclipse in this way is not recorded
>> by the jobtracker (can't monitor it from web UI). But finally the output
>> appears in HDFS path as the parameter I gave. It's really strange that
>> makes me think it's a standalone job run then it writes output to HDFS.
>>
>> So how do you code and launch jobs to cluster?
>>
>> Many thanks.
>>
>> --
>> *JU Han*
>>
>> UTC   -  Université de Technologie de Compiègne
>> * **GI06 - Fouille de Données et Décisionnel*
>>
>> +33 061960
>>
>
>


Unsubscribe

2013-04-23 Thread neeraj.maha...@absicorp.com


Regards,
Neeraj Mahajan


Disclaimer 

This communication (including the attached document(s), if a document is attached hereto) contains information that is proprietary and confidential to ABSi, which shall not be disclosed or disseminated outside of ABSi except in connection with ABSi business activities, or duplicated or used in any manner whatsoever that is not related to or in furtherance of ABSi business activities. Any use or disclosure in whole or in part of this information that is inconsistent with the foregoing sentence or without the explicit written permission of ABSi is strictly prohibited. 


© Copyright, ABSi . All rights reserved




Re: Job launch from eclipse

2013-04-23 Thread shashwat shriparv
You need to generate a jar file, pass all the parameters on run time if any
is fixed and run at hadoop like hadoop -jar jarfilename.jar 

*Thanks & Regards*

∞
Shashwat Shriparv



On Tue, Apr 23, 2013 at 6:51 PM, Han JU  wrote:

> Hi,
>
> I'm getting my hands on hadoop. One thing I really want to know is how you
> launch MR jobs in a development environment.
>
> I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2. With
> this plugin I can manage HDFS and submit job to cluster. But the strange
> thing is, every job launch from Eclipse in this way is not recorded by the
> jobtracker (can't monitor it from web UI). But finally the output appears
> in HDFS path as the parameter I gave. It's really strange that makes me
> think it's a standalone job run then it writes output to HDFS.
>
> So how do you code and launch jobs to cluster?
>
> Many thanks.
>
> --
> *JU Han*
>
> UTC   -  Université de Technologie de Compiègne
> * **GI06 - Fouille de Données et Décisionnel*
>
> +33 061960
>


Job launch from eclipse

2013-04-23 Thread Han JU
Hi,

I'm getting my hands on hadoop. One thing I really want to know is how you
launch MR jobs in a development environment.

I'm currently using Eclipse 3.7 with hadoop plugin from hadoop 1.0.2. With
this plugin I can manage HDFS and submit job to cluster. But the strange
thing is, every job launch from Eclipse in this way is not recorded by the
jobtracker (can't monitor it from web UI). But finally the output appears
in HDFS path as the parameter I gave. It's really strange that makes me
think it's a standalone job run then it writes output to HDFS.

So how do you code and launch jobs to cluster?

Many thanks.

-- 
*JU Han*

UTC   -  Université de Technologie de Compiègne
* **GI06 - Fouille de Données et Décisionnel*

+33 061960


Re: Uploading file to HDFS

2013-04-23 Thread 超级塞亚人
Asking for help! I'm facing the problem that no datanode to stop. Namenode
has been started but datanode can't be started. What should I do on
namenode and datanode? Thank you very much


2013/4/19 超级塞亚人 

> I have a problem. Our cluster has 32 nodes. Each disk is 1TB. I wanna
> upload 2TB file to HDFS.How can I put the file to the namenode and upload
> to HDFS?


Re: Hadoop sampler related query!

2013-04-23 Thread Rahul Bhattacharjee
+ mapred dev


On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:

> Hi,
>
> I have a question related to Hadoop's input sampler ,which is used for
> investigating the data set before hand using random selection , sampling
> etc .Mainly used for total sort , used in pig's skewed join implementation
> as well.
>
> The question here is -
>
> Mapper
>
> K and V are input key and value of the mapper .Essentially coming in from
> the input format. OK and OV are output key and value emitted from the
> mapper.
>
> Looking at the input sample's code ,it looks like it is creating the
> partition based on the input key of the mapper.
>
> I think the partitions should be created considering the output key (OK)
> and the output key sort comparator should be used for sorting the samples.
>
> If partitioning is done based on input key and the mapper emits a
> different key then the total sort wouldn't hold any good.
>
> Is there is any condition that input sample is to be only used for
> mapper?
>
>
> Thanks,
> Rahul
>
>


Re: unsubscribe

2013-04-23 Thread Panshul Whisper
Lol
On Apr 23, 2013 10:39 AM, "Gustavo Ioschpe" 
wrote:

>
>
>
> Enviado por Samsung Mobile
>


unsubscribe

2013-04-23 Thread Gustavo Ioschpe



Enviado por Samsung Mobile

Re: Error about MR task when running 2T data

2013-04-23 Thread Geelong Yao
I have set two disk available for tem file, one is /usr another is /sda
But I found the first /usr is full while /sda has not been used.
Why would this hadppen ? especially when the first path is full
[image: 内嵌图片 1]


2013/4/23 Harsh J 

> Does your node5 have adequate free space and proper multi-disk
> mapred.local.dir configuration set in it?
>
> On Tue, Apr 23, 2013 at 12:41 PM, 姚吉龙  wrote:
> >
> > Hi Everyone
> >
> > Today I am testing about 2T data on my cluster, there several failed map
> > task and reduce task on same node
> > Here is the log
> >
> > Map failed:
> >
> > org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> > valid local directory for output/spill0.out
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > at
> >
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
> > at
> >
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
> > at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
> > at
> >
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >
> >
> > Reduce failed:
> > java.io.IOException: Task: attempt_201304211423_0003_r_06_0 - The
> reduce
> > copier failed
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:396)
> > at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> > at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
> not
> > find any valid local directory for output/map_10003.out
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> > at
> >
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> > at
> >
> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:176)
> > at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2742)
> > at
> >
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2706)
> >
> >
> > Does this mean something wrong with the configuration on node5? or this
> is
> > normal when we test the data over TBs.This is the first time I run data
> over
> > TBs
> > Any suggestion is welcome
> >
> >
> > BRs
> > Geelong
> >
> >
> > --
> > From Good To Great
>
>
>
> --
> Harsh J
>



-- 
>From Good To Great
<>

Mapper in-memory buffering

2013-04-23 Thread Peter Marron
Hi,

Just a question about the implementation of Map/Reduce.
I've been thinking about the output of the map stage.
Logically all of the records emitted by the mapper have to be partitioned and
sorted before they go into the reducers. (We can ignore the partitioning for the
moment and so I'm just interested in the sorting.)

Now it seems to me that the obvious way to do this would be to have
some sort of sorted structure (balanced binary tree for example) so that
the (K, V) pairs emitted would be held in sorted order.

But when I read White's TDG (3ed p209) it's pretty explicit that the data
emitted is just held in a circular buffer in memory and that the sort occurs
in-memory as part of the dump to disc. Examining the code
(version 1.0.4) MapTask.java proves this to be the case.

So the first question is why is it done this way?

As far as I can see buffering the objects and doing one sort at the
end is going to be computational complexity of order NlgN
but maintaining a sorted in-memory structure is going to be of
the same order so, at least asymptotically, there isn't going to be
much difference between the two approaches.

I can see that by serializing the objects into the circular buffer
immediately you can minimize the number of key and value objects
that need to be instantiated. (Particularly if the mapper re-uses its
objects and custom comparators are used then barely any objects
need to be constructed or destroyed.) In this way I guess that the
load on the heap can be kept minimal.

Is this the reason that it is done this way?

Is there some other reason, perhaps a property of Java (I'm no
expert) which makes one way preferable to the other?

It's just that if the emitted data were held sorted then the opportunity
to run the "combiner" much earlier seems to be much easier.
For example if were running wordCount (or anything that needs
the SumReducer) then we could just increment the counts
held in-memory and we'd never have to emit duplicates.

(In fact I'm tempted to hold a HashSet of references to the
(K,V) pairs held serialized in the buffer and then incrementing
the counts in memory so that I don't need to emit duplicates.
But this seems perverse and far too implementation dependent.)

Any thoughts?

Z



unsubscribe

2013-04-23 Thread Nitzan Raanan



"This e-mail message may contain confidential, commercial or privileged 
information that constitutes proprietary information of Comverse Technology or 
its subsidiaries. If you are not the intended recipient of this message, you 
are hereby notified that any review, use or distribution of this information is 
absolutely prohibited and we request that you delete all copies and contact us 
by e-mailing to: secur...@comverse.com. Thank You."


Re: Sorting Values sent to reducer NOT based on KEY (Depending on part of VALUE)

2013-04-23 Thread Sofia Georgiakaki
Hello,

Sorting is done by the SortingComparator which performs sorting based on the 
value of key. A possible solution would be the following:
You could write a custom Writable comparable class which extends 
WritableComparable (lets call it MyCompositeFieldWritableComparable), that will 
store your current key and the part of the value that you want your sorting to 
be based on. As I understand from your description, this writable class will 
have 2 IntWritable fields, e.g
(FieldA, fieldB)

(0,4)
(1,1)
(2,0)
Implement the methods equals, sort, hashCode, etc in your custom writable to 
override the defaults. Sorting before the reduce phase will be performed based 
on the compareTo() implementation of your custom writable, so you can write it 
in a way that will compare only fieldB. 

Be careful in the way you will implement methods 
MyCompositeFieldWritableComparable.equals() -it will be used to group  in the reducer-, MyCompositeFieldWritableComparable.compareTo() 
and MyCompositeFieldWritableComparable.hashCode().
So your new KEY class will be MyCompositeFieldWritableComparable.
As an alternative and cleaner implementation, write the 
MyCompositeFieldWritableComparable class and also a HashOnOneFieldPartitioner 
class (which extends Partitioner) that will do something like this:

@Override

public int getPartition(K key, V value,
  int numReduceTasks) {
    if (key instanceof MyCompositeFieldWritableComparable)
         return ( ((MyCompositeFieldWritableComparable) 
key).hashCodeBasedOnFieldB() & Integer.MAX_VALUE) % numReduceTasks;
    else
        return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
  }




You can also find related articles in the web, eg 
http://riccomini.name/posts/hadoop/2009-11-13-sort-reducer-input-value-hadoop/.

Have a nice day,
Sofia




>
> From: Vikas Jadhav 
>To: user@hadoop.apache.org 
>Sent: Tuesday, April 23, 2013 8:44 AM
>Subject: Sorting Values sent to reducer NOT based on KEY (Depending on part of 
>VALUE)
> 
>
>
>Hi 
> 
>how to sort value in hadoop using standard sorting algorithm of hadoop ( i.e 
>sorting facility provided by hadoop)
> 
>Requirement: 
> 
>1) Values shoulde be sorted depending on some part of value 
> 
>For Exam (KEY,VALUE)
> 
> (0,"BC,4,XY')
> (1,"DC,1,PQ")
> (2,"EF,0,MN")
> 
>Sorted sequence @ reduce reached should be 
> 
>(2,"EF,0,MN")
>(1,"DC,1,PQ")
>(0,"BC,4,XY')
> 
>Here sorted depending on second attribute postion in value.
> 
>Thanks
> 
>
>-- 
>  Regards,
>   Vikas 
>
>

RE: How to open DEBUG level for YARN application master ?

2013-04-23 Thread Nitzan Raanan
Thanks

That worked !

BR
Raanan Nitzan

-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Tuesday, April 23, 2013 9:42 AM
To: 
Subject: Re: How to open DEBUG level for YARN application master ?

To change the MR AM's default log level from INFO, set the job config:
"yarn.app.mapreduce.am.log.level" to DEBUG or whatever level you
prefer.

On Mon, Apr 22, 2013 at 6:21 PM, Nitzan Raanan
 wrote:
> Hi
>
>
>
> How do I open the DEBUG level for YARN application master  process ?
>
> I couldn’t find the relevant configuration the set (probably need to add
> them somewhere).
>
>
>
> ps -ef | grep java | grep yarn
>
> yarn 12585 29113  0 15:33 ?00:00:00 /bin/bash -c
> /usr/java/jre1.6/bin/java -Dlog4j.configuration=container-log4j.properties
> -Dyarn.app.mapreduce.container.log.dir=/var/log/hadoop-yarn/containers/application_1366187149316_0049/container_1366187149316_0049_01_01
> -Dyarn.app.mapreduce.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA
> -Xmx1024m org.apache.hadoop.mapreduce.v2.app.MRAppMaster
> 1>/var/log/hadoop-yarn/containers/application_1366187149316_0049/container_1366187149316_0049_01_01/stdout
> 2>/var/log/hadoop-yarn/containers/application_1366187149316_0049/container_1366187149316_0049_01_01/stderr
>
>
>
>
>
> I’m using following YARN version
>
> # rpm -qa | grep yarn
>
> hadoop-yarn-nodemanager-2.0.0+922-1.cdh4.2.0.p0.12.el5
>
> hadoop-yarn-nodemanager-2.0.0+88-1.cdh4.0.0.p0.26.el5
>
> hadoop-yarn-2.0.0+922-1.cdh4.2.0.p0.12.el5
>
>
>
> Thanks
>
> Raanan Nitzan,
>
> R&D Developer
>
> Comverse
>
>
>
>
> 
> “This e-mail message may contain confidential, commercial or privileged
> information that constitutes proprietary information of Comverse Technology
> or its subsidiaries. If you are not the intended recipient of this message,
> you are hereby notified that any review, use or distribution of this
> information is absolutely prohibited and we request that you delete all
> copies and contact us by e-mailing to: secur...@comverse.com. Thank You.”



--
Harsh J

“This e-mail message may contain confidential, commercial or privileged 
information that constitutes proprietary information of Comverse Technology or 
its subsidiaries. If you are not the intended recipient of this message, you 
are hereby notified that any review, use or distribution of this information is 
absolutely prohibited and we request that you delete all copies and contact us 
by e-mailing to: secur...@comverse.com. Thank You.”


Re: Error about MR task when running 2T data

2013-04-23 Thread Harsh J
Does your node5 have adequate free space and proper multi-disk
mapred.local.dir configuration set in it?

On Tue, Apr 23, 2013 at 12:41 PM, 姚吉龙  wrote:
>
> Hi Everyone
>
> Today I am testing about 2T data on my cluster, there several failed map
> task and reduce task on same node
> Here is the log
>
> Map failed:
>
> org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
> valid local directory for output/spill0.out
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> at
> org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
> at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
> at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
> at
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> Reduce failed:
> java.io.IOException: Task: attempt_201304211423_0003_r_06_0 - The reduce
> copier failed
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
> find any valid local directory for output/map_10003.out
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
> at
> org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:176)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2742)
> at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2706)
>
>
> Does this mean something wrong with the configuration on node5? or this is
> normal when we test the data over TBs.This is the first time I run data over
> TBs
> Any suggestion is welcome
>
>
> BRs
> Geelong
>
>
> --
> From Good To Great



-- 
Harsh J


Re:

2013-04-23 Thread Mikael Sitruk
Suneel hi

In addition to Mohammad answer -
In order for someone to be able to translate to pig latin you should
also provide indication how the data is stored, are you using hbase for
storing the data or directly on hdfs? according to this you can transform
your field mappings. You need to explain the schema

you should probably try the pig tutorial to get better understanding of how
to perform queries, MAX, COUNT, FOREACH, GROUP, FILTER, JOIN (inner &
outer) are operator supported by pig. Sub select should be break in
different "select" (a.k.a. relation)
For the case statements I presume you will have to implement some udf.


Mikael.S




On Mon, Apr 22, 2013 at 11:31 PM, Mohammad Tariq  wrote:

> Hello Suneel,
>
>Not the answer to your question, but definitely gonna help you in
> getting the answer.
>
> This list is specific to Hadoop users. It would be better if you ask this
> question on the Pig mailing list.
>
> Providing a subject line is always a good habit.
>
> It's always better to show something which you have tried. People on the
> mailing list might not be free enough to go through big complex queries or
> programs. But they could definitely try to help you in order to make it
> work or better.
>
> HTH
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Mon, Apr 22, 2013 at 2:46 PM, suneel hadoop 
> wrote:
>
>> Can any one help me to change this SQL to pig Latin
>>
>>
>>
>> SELECT ('CSS'||DB.DISTRICT_CODE||DB.BILLING_ACCOUNT_NO) BAC_KEY,
>>
>> CASE WHEN T1.TAC_142 IS NULL THEN 'N' ELSE T1.TAC_142 END TAC_142 FROM
>>
>> (
>>
>>
>>
>> SELECT DISTRICT_CODE,BILLING_ACCOUNT_NO,
>>
>> MAX(CASE WHEN TAC_1 = 'Y' AND (TAC_2 = 'Y' OR TAC_3 = 'Y') THEN 'Y' ELSE
>> 'N' END) TAC_142 FROM
>>
>> (
>>
>> SELECT DI.DISTRICT_CODE,DI.BILLING_ACCOUNT_NO,DI.INST_SEQUENCE_NO,
>>
>> MAX(CASE WHEN TRIM(DIP.PRODUCT_CODE) = 'A14493' AND UPPER(DI.HAZARD) LIKE
>> '%999%EMERGENCY%LINE%' AND UPPER(DI.WARNING) LIKE '%USE%999%ALERT%METHOD%'
>> THEN 'Y' ELSE 'N' END) TAC_1,
>>
>> MAX(CASE WHEN TRIM(DIP.PRODUCT_TYPE) IN ('20','21') AND
>> TRIM(DIP.MAINTENANCE_CONTRACT) IN ('E','T') THEN 'Y' ELSE 'N' END) TAC_2,
>>
>> MAX(CASE WHEN TRIM(DIP.PRODUCT_CODE) IN ('A14498','A14428','A22640') THEN
>> 'Y' ELSE 'N' END) TAC_3
>>
>> FROM
>>
>> D_INSTALLATION DI,
>>
>> D_INSTALLATION_PRODUCT DIP
>>
>> WHERE
>>
>> DIP.INST_SEQUENCE_NO = DI.INST_SEQUENCE_NO AND
>>
>> DIP.BAC_WID = DI.BAC_WID
>>
>> GROUP BY DI.DISTRICT_CODE,DI.BILLING_ACCOUNT_NO,DI.INST_SEQUENCE_NO
>>
>> )
>>
>> GROUP BY DISTRICT_CODE,BILLING_ACCOUNT_NO)
>>
>> T1,
>>
>> D_BILLING_ACCOUNT DB
>>
>> WHERE
>>
>> DB.DISTRICT_CODE = T1.DISTRICT_CODE(+) AND
>>
>> DB.BILLING_ACCOUNT_NO = T1.BILLING_ACCOUNT_NO(+)
>>
>
>


Error about MR task when running 2T data

2013-04-23 Thread 姚吉龙
Hi Everyone

Today I am testing about 2T data on my cluster, there several failed map
task and reduce task on same node
Here is the log

Map failed:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any
valid local directory for output/spill0.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
 at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
 at
org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
 at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1298)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:699)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
 at org.apache.hadoop.mapred.Child.main(Child.java:249)


Reduce failed:
java.io.IOException: Task: attempt_201304211423_0003_r_06_0 - The
reduce copier failed
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not
find any valid local directory for output/map_10003.out
at
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
 at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
 at
org.apache.hadoop.mapred.MapOutputFile.getInputFileForWrite(MapOutputFile.java:176)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2742)
 at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2706)


Does this mean something wrong with the configuration on node5? or this is
normal when we test the data over TBs.This is the first time I run data
over TBs
Any suggestion is welcome


BRs
Geelong


-- 
>From Good To Great