Hello Hao,
Am sorry if I confused you. By CPUs I meant the CPUs visible to your OS
(/proc/cpuinfo), so yes the total number of cores.
On 10-Jan-2012, at 12:39 PM, hao.wang wrote:
Hi ,
Thanks for your reply!
According to your suggestion, Maybe I can't apply it to our hadoop cluster.
Cus,
Hi ,
Is it possible to get data from web services using Hadoop MR jobs?
Regards,
Shreya
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please
Hi,
Thanks for your help, your suggestion is very usefully.
I have another question that is whether the sum of maps and reduces equals
to the total number of cores.
regards!
2012-01-10
hao.wang
发件人: Harsh J
发送时间: 2012-01-10 16:44:07
收件人: common-user
抄送:
主题: Re: how to set
Mark,
[mark@node67 ~]$ telnet node77
You need to specify the port number along with the server name like `telnet
node77 1234`.
2012-01-09 10:04:03,436 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: localhost/127.0.0.1:12123. Already tried 0 time(s).
Slaves are not able to
Hi Hao,
Ideally you would want to leave out a core each for Tasktracker and
Datanode process' on each node. The rest could be used for maps and
reducers.
Thanks,
Prashant
2012/1/10 hao.wang hao.w...@ipinyou.com
Hi,
Thanks for your help, your suggestion is very usefully.
I have another
Yes, divide the number of cores between map and reduce slots. Depending on your
workload, start with a 4:3 ratio and work your way to better tuning eventually
(if you have more map-only jobs, adjust ratio accordingly, etc.).
Changing slot params requires TaskTracker restarts alone, not
Thanks all for advice - one more question on re-reading Harsh's helpful reply.
Intermediate (M-to-R) files use a custom IFile format these days. How
recently is these days, and can this addition be pinned down to any one
version of Hadoop?
Tony
-Original Message-
From: Harsh J
Yes. Hive doesn't format data when you load it. The only exception is if you do
an INSERT OVERWRITE ... .
-Joey
On Jan 10, 2012, at 6:08, Tony Burton tbur...@sportingindex.com wrote:
Thanks for this Bejoy, very helpful.
So, to summarise: when I CREATE EXTERNAL TABLE in Hive, the STORED
Tony,
Sorry for being ambiguous, I was too lazy to search at the time. This has been
the case since release 0.18.0. See
https://issues.apache.org/jira/browse/HADOOP-2095 for more information.
On 10-Jan-2012, at 4:18 PM, Tony Burton wrote:
Thanks all for advice - one more question on
Hi,
How to set the maximum number of containers to be executed in
each node.
So that at a time only that much of containers will be running
in that node..
At the cloudera course, they said this is a bad idea, but im working at a place
that does just this... In the reducers. the answer is Yes You can make
http requests in Hadoop jobs.
I'd like to know more about others thoughts on this Is it customary ?
Jay Vyas
MMSB
UCHC
On Jan 10,
Hi.
I am no expert, but you could try this.
Your problem, I guess, is that the record reader reads multiple lines of
work (tasks) and gives to each mapper and thus if you only have a few tasks
(line of work in the input file) Hadoop will not spawn multiple mappers.
You could try this, make
If you are looking to crawl websites, you can take a look at Apache Nutch and
how it connects with Apache Hadoop.
I'll let others comment on why we do not recommend this, but I can surely think
of a case where a large-slotted cluster having all its tasks hitting a
particular site at the same
Similarly there is the NLineInputFormat that does this automatically. If your
input is small it will read in the input and make a split for every N lines of
input. Then you don't have to reformat your data files.
--Bobby Evans
On 1/10/12 8:09 AM, GorGo gylf...@ru.is wrote:
Hi.
I am no
Hi everyone.
I am running C++ code using the PIPES wrapper and I am looking for some
tutorials, examples or any kind of help with regards to using binary data.
My problems is that I am working with large chunks of binary data and
converting it to text not an option.
My first question is thus,
I think what you want to try and do is to use JNI rather then pipes or
streaming. PIPES has known issues and it is my understanding that its use is
now discouraged. The ideal way to do this is to use JNI to send your data to
the C code. Be aware that moving large amounts of data through JNI
Hi Tony
Please find responses inline
So, to summarise: when I CREATE EXTERNAL TABLE in Hive, the STORED AS, ROW
FORMAT and other parameters you mention are telling Hive what to expect
when it reads the data I want to analyse, despite not checking the data to
see if it meets these criteria?
I have noticed this too with one job. Keys that are equal (.equals(),
hashCode() === and compareTo === 0) are being sent to multiple reduce tasks
therefore resulting in incorrect output.
Any insight?
On Sat, Aug 13, 2011 at 11:14 AM, Stan Rosenberg
srosenb...@proclivitysystems.com wrote:
Hi
The Hadoop framework reuses Writable objects for key and value arguments,
so if your code stores a pointer to that object instead of copying it you
can find yourself with mysterious duplicate objects. This has tripped me
up a number of times. Details on what exactly I encountered and how I fixed
I'm (unfortunately) aware of this and this isn't the issue. My key object
contains only long, int and String values.
The job map output is consistent, but the reduce input groups and values
for the key vary from one job to the next on the same input. It's like it
isn't properly comparing and
Naturally after I send that email I find that I am wrong. I was also using
an enum field, which was the culprit.
On Tue, Jan 10, 2012 at 6:13 PM, William Kinney william.kin...@gmail.comwrote:
I'm (unfortunately) aware of this and this isn't the issue. My key object
contains only long, int and
hi,
how can i specify which class' main method to run as a job when i do
mapreduce? lets say my jar has 4 classes and each one of them has a main
method. i want to pass the class name in the 'hadoop jar jar file
classname' command. this will be similar to running stock tools inside
hbase or other
You can use yarn.nodemanager.resource.memory-mb to set the limit on
each NodeManager.
You should have a good look at
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
. It has enough information to get you a good distance.
HTH.
+Vinod
On Tue, Jan 10,
Yes, you can.
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html#Writing_an_ApplicationMaster
should give you a very good idea and example code about this.
But, the requirements are not hard-fixed. If the scheduler cannot find
free resources on
Hi,
I would like to bundle a binary with a hadoop job and call it from inside
the mappers/reducers.
The binary is a C++ program that I do not want to re-implement in Java. I
want to fork it as a subprocess from inside mappers/reducers and capture
the output (on stdout).
So, I need to get the
Hi all,
For the TextInputFormat class, the input key is a file position. This is
working well. But when I switch to LzoTextInputFormat to read LZO files, the
key does not make sense. It does not indicate file position. Is the file
position supported with LzoTextInputFormat?
Here is a job
Couldn't you write a simple wrapper around your binary, include the binary
using the -file option and use Streaming?
Or use the distributed cache to copy your binaries to all the compute
nodes.
On Tue, Jan 10, 2012 at 5:01 PM, Daren Hasenkamp dhasenk...@berkeley.eduwrote:
Hi,
I would like to
Hi Vinod
You can use the format as
hadoop jar jarName className
Like - hadoop jar /home/user/sample.jar com.sample.apps.MainClass ..
Don't specify the main class while packing your jar. This would help you
incorporate multiple entry points in same jar for different functionality.
28 matches
Mail list logo