Hi Rahul,
Thanks for the information. I got your point. What I specifically want to ask
is
that if I use the following method to read my file now in each mapper:
FileSystemhdfs=FileSystem.get(conf);
URI[] uris=DistributedCache.getCacheFiles(conf);
Thank you very much Ken.
What is the new replacement for
reporter.setStatus
reporter.incrCounter
and how can I access report?
Thanks in Advance,
Vitaliy S
On Thu, Jul 8, 2010 at 8:12 PM, Ken Goodhope kengoodh...@gmail.com wrote:
The reporter and the outputcollector have all been rolled up
John,
Can you please redirect this to pig-u...@hadoop.apache.org ? You're
more likely to get good responses there.
Thanks
hemanth
On Thu, Jul 8, 2010 at 7:01 AM, John Seer pulsph...@yahoo.com wrote:
Hello, Is there any way to share shema file in pig for the same table between
projects?
--
Edward,
Overall, I think the consideration should be about how much load do
you expect to support on your cluster. For HDFS, there's a good amount
of information about how much RAM is required to support a certain
amount of data stored in DFS; something similar can be found for
Map/Reduce as
Hello everyone,
I have a cluster from 8 datanodes and a namenode.
When I start teragen program everything works OK, the data is generated. But
when I start terasort program, seems like that only 2 datanodes do the job.
And everything is so slow. I've tried with only 10 records and cluster
Hello,
I have set mapred.job.reuse.jvm.num.tasks to -1 for re-using the JVM.
My intention is to run a helper program at the beginning of the job and then
feed the key/value pairs
from the tasks to the helper program.
Currently am running it in the call to setup below.
If JVM Task re-use is -1,
I would guess that you didn't set the number of reducers for the job,
and it defaulted to 2.
-- Owen
Hemanth,
Thank you for the elaborate explanation.
First of all, The total swap memory size is over 4 giga bytes, but the
actual used size around several hundred kilo bytes.
So I guess I can use almost whole 4 giga bytes of physical memory.
The sentence streaming does not allow enough memory for
Today is your last chance to submit a CFP abstract for the 2010 Surge
Scalability Conference. The event is taking place on Sept 30 and Oct 1,
2010 in Baltimore, MD. Surge focuses on case studies that address
production failures and the re-engineering efforts that led to victory
in Web
Hi all,
A quick note that I'll be the instructor for the next Hadoop Bootcamp
training course from Scale Unlimited.
It's a two day class on July 22nd and 23rd, which covers the usual
high (and low) points of Hadoop.
Plus bonus material on using Hadoop with machine learning, generating
Awesome course - I took the historic first one, and benefited a lot. Great
that Ken is going to teach it.
Mark
On Fri, Jul 9, 2010 at 9:31 AM, Ken Krugler kkrug...@scaleunlimited.comwrote:
Hi all,
A quick note that I'll be the instructor for the next Hadoop Bootcamp
training course from
I am looking into to Chukwa to collect/aggregate our search logs from across
multiple hosts. As I understand it I need to have a agent/adaptor running on
each host which then in turn forward this to a collector (across the
network) which will then write out to HDFS. Correct?
Does Hadoop need to
Your understanding of how Chukwa works is correct.
Hadoop by itself is a system that contains both the HDFS and the MapReduce
systems. The other projects you lists are all projects built upon Hadoop,
but you don't need them to run or to get value out of Hadoop by itself.
To run the Chukwa agent
Does anyone know what might be causing this error? I am using version Hadoop
0.20.2 and it happens when I run bin/hadoop dfs -copyFromLocal ...
10/07/09 15:51:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.io.IOException: Bad connect ack with firstBadLink 128.238.55.43:50010
Please see the description about xcievers at:
http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements
You can confirm that you have a xcievers problem by grepping the
datanode logs with the error message pasted in the last bullet point.
On Fri, Jul 9, 2010 at 1:10 PM, Raymond
Hi folks,
Total newbie so if this isn't the place to ask, please forgive me. I
have a business focused social network application kinda like linkedin
to develop. One of the requirements is that any new user who joins
must rank all existing users based on his profile and they must in
turn
Hi Ted, thanks for your replay. That does not seem to make a difference
though. I put that property in the xml file, restarted everything, tried to
transfer the file again but the same thing occurred.
I had my cluster working perfectly for about a year but I recently had some
disk
failures
Do you happen to see something similar to:
10/03/17 15:47:58 WARN hdfs.DFSClient: NotReplicatedYetException sleeping
/user/perserver/data/575Gb/ps_
es_mstore_events_fact.txt retries left 4
10/03/17 15:47:58 INFO hdfs.DFSClient:
org.apache.hadoop.ipc.RemoteException:
This is a question I should go and test out myself but was wondering
if anyone has a quick answer.
We have map/reduce jobs that produce lots of smaller files to a folder.
We also have a hive external table pointed at this folder.
We have a tool FileCrusher which is made to bunch up multiple small
hgahlot wrote:
I had the same problem but Amreshwari's suggestion solved it. I am porting
a code from the 0.18.3 API to 0.20.2 API. I am now facing problems with
the setting of keys through Configuration object. The value set during
configuration using conf.setBoolean(String value, default
20 matches
Mail list logo