Thanks for putting this up, it's very useful.
I'd encourage you to contribute this a documentation patch so that you help
everyone who comes to hadoop.apache.org, plus you can be a part of the project
and a contributor.
I can help with the mechanics - here is a link to help you get started:
On Wed, 31 Aug 2011 08:44:42 -0700
Mohit Anchlia mohitanch...@gmail.com wrote:
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce program need
to read these files using some proprietry tool extract values and do
some
Hi
I use hadoop for a MapReduce job in my system. I would like to have the
job run very 5th minute. Are there any distributed timer job stuff in
hadoop? Of course I could setup a timer in an external timer framework
(CRON or something like that) that invokes the MapReduce job. But CRON
is
Hi
Try to use Oozie for job coordination and work flows.
On Thu, Sep 1, 2011 at 12:30 PM, Per Steffensen st...@designware.dk wrote:
Hi
I use hadoop for a MapReduce job in my system. I would like to have the job
run very 5th minute. Are there any distributed timer job stuff in hadoop?
Of
On 25/08/11 08:20, Sagar Shukla wrote:
Hi Hakan,
Please find my comments inline in blue :
-Original Message-
From: Hakan (c)lter [mailto:hakanil...@gmail.com]
Sent: Thursday, August 25, 2011 12:28 PM
To: common-user@hadoop.apache.org
Subject: Hadoop with Netapp
Hi
On 29/08/11 20:31, Frank Astier wrote:
Is it possible to turn off all the Hadoop logs simultaneously? In my unit
tests, I don’t want to see the myriad “INFO” logs spewed out by various Hadoop
components. I’m using:
((Log4JLogger) DataNode.LOG).getLogger().setLevel(Level.OFF);
Hi
Thanks a lot for pointing me to Oozie. I have looked a little bit into
Oozie and it seems like the component triggering jobs is called
Coordinator Application. But I really see nowhere that this
Coordinator Application doesnt just run on a single machine, and that it
will therefore not
If I get you right you are asking about Installing Oozie as Distributed
and/or HA cluster?!
In that case I am not familiar with an out of the box solution by Oozie.
But, I think you can made up a solution of your own, for example:
Installing Oozie on two servers on the same partition which will be
From this week,My Hadoop caught his problem with information as following:
Lost task tracker: tracker_rsync.host01:localhost/127.0.0.1:40759
Map output lost, rescheduling:
getMapOutput(attempt_201108021855_6734_m_97_1,2002) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could
Hi,
I have successfully installed scipy on my Python 2.7 on my local Linux, and
I want to pack my Python2.7 (with scipy) onto Hadoop and run my Python
MapReduce scripts, like this:
20 ${HADOOP_HOME}/bin/hadoop streaming \$
21 -input ${input} \$
22 -output ${output} \$
23
[moving common-user@ to BCC]
Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.
Oozie consists of 2 services, a SQL database to store the Oozie jobs state
and a servlet container where Oozie app proper runs.
The solution
Thanks for your response. See comments below.
Regards, Per Steffensen
Alejandro Abdelnur skrev:
[moving common-user@ to BCC]
Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with that in mind, we even did a prototype.
Ok, so if it isnt HA out-of-the-box I
Hi,
On Thu, Sep 1, 2011 at 9:08 AM, Raimon Bosch raimon.bo...@gmail.com wrote:
Hi,
I'm trying to create a table similar to apache_log but I'm trying to avoid
to write my own map-reduce task because I don't want to have my HDFS files
twice.
So if you're working with log lines like this:
On Thu, Sep 1, 2011 at 7:58 PM, Per Steffensen st...@designware.dk wrote:
Thanks for your response. See comments below.
Regards, Per Steffensen
Alejandro Abdelnur skrev:
[moving common-user@ to BCC]
Oozie is not HA yet. But it would be relatively easy to make it. It was
designed with
Well I am not sure I get you right, but anyway, basically I want a timer
framework that triggers my jobs. And the triggering of the jobs need to
work even though one or two particular machines goes down. So the timer
triggering mechanism has to live in the cluster, so to speak. What I
dont
On Thu, Sep 1, 2011 at 1:25 AM, Dieter Plaetinck
dieter.plaeti...@intec.ugent.be wrote:
On Wed, 31 Aug 2011 08:44:42 -0700
Mohit Anchlia mohitanch...@gmail.com wrote:
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce
In Hadoop, if the client that triggers the job fails, is there a way to
recover and another client to submit the job?
On Thu, Sep 1, 2011 at 8:44 PM, Per Steffensen st...@designware.dk wrote:
Well I am not sure I get you right, but anyway, basically I want a timer
framework that triggers my
On Thu, Sep 1, 2011 at 8:37 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
Thanks! Is there a specific tutorial I can focus on to see how it could be
done?
Take the word count example and change its output format to be
SequenceFileOutputFormat.
01.09.11 18:14, Per Steffensen написав(ла):
Well I am not sure I get you right, but anyway, basically I want a
timer framework that triggers my jobs. And the triggering of the jobs
need to work even though one or two particular machines goes down. So
the timer triggering mechanism has to live
Hey there,
I would like to do the cross product of two data sets, any of them feeds in
memory. I've seen pig has the cross operation. Can someone please explain me
how it implements it?
--
View this message in context:
http://ofps.oreilly.com/titles/9781449302641/advanced_pig_latin.html
search on cross matches
Alan.
On Sep 1, 2011, at 11:44 AM, Marc Sturlese wrote:
Hey there,
I would like to do the cross product of two data sets, any of them feeds in
memory. I've seen pig has the cross operation. Can
Vitalii Tymchyshyn skrev:
01.09.11 18:14, Per Steffensen написав(ла):
Well I am not sure I get you right, but anyway, basically I want a
timer framework that triggers my jobs. And the triggering of the jobs
need to work even though one or two particular machines goes down. So
the timer
Hi all,
I was wondering if anyone was familiar with this class. I want to
create multiple output files during my reduce.
My input files will consist of
name1action1date1
name1action2date2
name1action3date3
name2action1date1
name2action2date2
name2action3date3
My goal is to create files with
Hi all,
I am trying to install Hadoop (release 0.20.203) on a machine with CentOS.
When I try to start HDFS, I get the following error.
machine-name: Unrecognized option: -jvm
machine-name: Could not create the Java virtual machine.
Any idea what might be the problem?
Thanks,
Abhishek
Hi Hailong,
I have installed JDK and set JAVA_HOME correctly (as far as I know).
Output of java -version is:
java version 1.6.0_04
Java(TM) SE Runtime Environment (build 1.6.0_04-b12)
Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode)
I also have another version installed 1.6.0_27 but get
Actually, I found the reason. I am running HDFS as root and there is
a bug that has recently been fixed.
https://issues.apache.org/jira/browse/HDFS-1943
Thanks,
Abhishek
On Thu, Sep 1, 2011 at 6:25 PM, Ravi Prakash ravihad...@gmail.com wrote:
Hi Abhishek,
Try reading through the shell
Hi Matt,
On Jun 20, 2011, at 1:46pm, GOEKE, MATTHEW (AG/1000) wrote:
Has anyone else run into issues using output compression (in our case lzo) on
TestDFSIO and it failing to be able to read the metrics file? I just assumed
that it would use the correct decompression codec after it finishes
Hi Tim,
You could create a custom HashPartitioner so that all key,value pairs
denoting the actions of the same user end up in the same reducer; then you
need
only one output file per reducer. Btw, how large are the output files? make
sure you don't end up creating
a lot of small files, i.e.,
28 matches
Mail list logo