Re: reading distributed cache returns null pointer

2010-07-09 Thread Denim Live
Hi Rahul, Thanks for the information. I got your point. What I specifically want to ask is that if I use the following method to read my file now in each mapper: FileSystemhdfs=FileSystem.get(conf); URI[] uris=DistributedCache.getCacheFiles(conf);

Re: How to access Reporter in new API?

2010-07-09 Thread Vitaliy Semochkin
Thank you very much Ken. What is the new replacement for reporter.setStatus reporter.incrCounter and how can I access report? Thanks in Advance, Vitaliy S On Thu, Jul 8, 2010 at 8:12 PM, Ken Goodhope kengoodh...@gmail.com wrote: The reporter and the outputcollector have all been rolled up

Re: Pig share schema between projetcs

2010-07-09 Thread Hemanth Yamijala
John, Can you please redirect this to pig-u...@hadoop.apache.org ? You're more likely to get good responses there. Thanks hemanth On Thu, Jul 8, 2010 at 7:01 AM, John Seer pulsph...@yahoo.com wrote: Hello, Is there any way to share shema file in pig for the same table between projects? --

Re: Is heap size allocation of namenode dynamic or static?

2010-07-09 Thread Hemanth Yamijala
Edward, Overall, I think the consideration should be about how much load do you expect to support on your cluster. For HDFS, there's a good amount of information about how much RAM is required to support a certain amount of data stored in DFS; something similar can be found for Map/Reduce as

Terasort problem

2010-07-09 Thread Tonci Buljan
Hello everyone, I have a cluster from 8 datanodes and a namenode. When I start teragen program everything works OK, the data is generated. But when I start terasort program, seems like that only 2 datanodes do the job. And everything is so slow. I've tried with only 10 records and cluster

Task JVM reuse: A Question regarding hadoop 0.20.1

2010-07-09 Thread Saptarshi Guha
Hello, I have set mapred.job.reuse.jvm.num.tasks to -1 for re-using the JVM. My intention is to run a helper program at the beginning of the job and then feed the key/value pairs from the tasks to the helper program. Currently am running it in the call to setup below. If JVM Task re-use is -1,

Re: Terasort problem

2010-07-09 Thread Owen O'Malley
I would guess that you didn't set the number of reducers for the job, and it defaulted to 2. -- Owen

Re: Is heap size allocation of namenode dynamic or static?

2010-07-09 Thread edward choi
Hemanth, Thank you for the elaborate explanation. First of all, The total swap memory size is over 4 giga bytes, but the actual used size around several hundred kilo bytes. So I guess I can use almost whole 4 giga bytes of physical memory. The sentence streaming does not allow enough memory for

Last day to submit your Surge 2010 CFP!

2010-07-09 Thread Jason Dixon
Today is your last chance to submit a CFP abstract for the 2010 Surge Scalability Conference. The event is taking place on Sept 30 and Oct 1, 2010 in Baltimore, MD. Surge focuses on case studies that address production failures and the re-engineering efforts that led to victory in Web

Hadoop Training

2010-07-09 Thread Ken Krugler
Hi all, A quick note that I'll be the instructor for the next Hadoop Bootcamp training course from Scale Unlimited. It's a two day class on July 22nd and 23rd, which covers the usual high (and low) points of Hadoop. Plus bonus material on using Hadoop with machine learning, generating

Re: Hadoop Training

2010-07-09 Thread Mark Kerzner
Awesome course - I took the historic first one, and benefited a lot. Great that Ken is going to teach it. Mark On Fri, Jul 9, 2010 at 9:31 AM, Ken Krugler kkrug...@scaleunlimited.comwrote: Hi all, A quick note that I'll be the instructor for the next Hadoop Bootcamp training course from

Chukwa questions

2010-07-09 Thread Blargy
I am looking into to Chukwa to collect/aggregate our search logs from across multiple hosts. As I understand it I need to have a agent/adaptor running on each host which then in turn forward this to a collector (across the network) which will then write out to HDFS. Correct? Does Hadoop need to

Re: Chukwa questions

2010-07-09 Thread Bill Graham
Your understanding of how Chukwa works is correct. Hadoop by itself is a system that contains both the HDFS and the MapReduce systems. The other projects you lists are all projects built upon Hadoop, but you don't need them to run or to get value out of Hadoop by itself. To run the Chukwa agent

Help with Hadoop runtime error

2010-07-09 Thread Raymond Jennings III
Does anyone know what might be causing this error? I am using version Hadoop 0.20.2 and it happens when I run bin/hadoop dfs -copyFromLocal ... 10/07/09 15:51:45 INFO hdfs.DFSClient: Exception in createBlockOutputStream java.io.IOException: Bad connect ack with firstBadLink 128.238.55.43:50010

Re: Help with Hadoop runtime error

2010-07-09 Thread Ted Yu
Please see the description about xcievers at: http://hbase.apache.org/docs/r0.20.5/api/overview-summary.html#requirements You can confirm that you have a xcievers problem by grepping the datanode logs with the error message pasted in the last bullet point. On Fri, Jul 9, 2010 at 1:10 PM, Raymond

Newbie question...is hadoop right for my app

2010-07-09 Thread Brian
Hi folks, Total newbie so if this isn't the place to ask, please forgive me. I have a business focused social network application kinda like linkedin to develop. One of the requirements is that any new user who joins must rank all existing users based on his profile and they must in turn

Re: Help with Hadoop runtime error

2010-07-09 Thread Raymond Jennings III
Hi Ted, thanks for your replay. That does not seem to make a difference though. I put that property in the xml file, restarted everything, tried to transfer the file again but the same thing occurred. I had my cluster working perfectly for about a year but I recently had some disk failures

Re: Help with Hadoop runtime error

2010-07-09 Thread Ted Yu
Do you happen to see something similar to: 10/03/17 15:47:58 WARN hdfs.DFSClient: NotReplicatedYetException sleeping /user/perserver/data/575Gb/ps_ es_mstore_events_fact.txt retries left 4 10/03/17 15:47:58 INFO hdfs.DFSClient: org.apache.hadoop.ipc.RemoteException:

HDFS moves and MapReduce jobs

2010-07-09 Thread Edward Capriolo
This is a question I should go and test out myself but was wondering if anyone has a quick answer. We have map/reduce jobs that produce lots of smaller files to a folder. We also have a hive external table pointed at this folder. We have a tool FileCrusher which is made to bunch up multiple small

Re: Distributed Cache with New API

2010-07-09 Thread hgahlot
hgahlot wrote: I had the same problem but Amreshwari's suggestion solved it. I am porting a code from the 0.18.3 API to 0.20.2 API. I am now facing problems with the setting of keys through Configuration object. The value set during configuration using conf.setBoolean(String value, default