Re: Non utf-8 chars in input
Hi Ajay, Try SequenceFileAsBinaryInputFormat ? Thanks Rekha On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote: Hi, I am using default inputFormat class for reading input from text files but the input file has some non utf-8 characters. I guess that TextInputFormat class is default inputFormat class and it replaces these non utf-8 chars by \uFFFD. If I do not want this behavior and need actual char in my mapper what should be the correct inputFormat class ? Regards, Ajay Srivastava
what happens when a datanode rejoins?
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: 1. Some of the blocks it was managing are deleted/modified? 2. The size of the blocks are now modified say from 64MB to 128MB? 3. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? Thanks, Mehul
Re: what happens when a datanode rejoins?
Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say from 64MB to 128MB? Block size is a per-file setting so new files will be 128MB, but the old ones will remain at 64MB. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? (assuming you didn't perform a decommission) Blocks that lived only on that datanode will be declared missing and the files associated with those blocks will be not be able to be fully read, until the datanode rejoins. George
Re: what happens when a datanode rejoins?
Mehul, Let me make an addition. Some of the blocks it was managing are deleted/modified? Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the modified, I'd advise against modifying blocks after they have been fully written. George
Re: what happens when a datanode rejoins?
George has answered most of these. I'll just add on: On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube mehul_cho...@symantec.com wrote: 1. Some of the blocks it was managing are deleted/modified? A DN runs a block report upon start, and sends the list of blocks to the NN. NN validates them and if it finds any files to miss block replicas post-report, it will schedule a re-replication from one of the good DNs that still carry it. The modified (out-of-HDFS) blocks fail their stored checksums so are treated as corrupt and deleted, and are re-replicated in the same manner. 2. The size of the blocks are now modified say from 64MB to 128MB? George's got this already. Changing of block size does not impact any existing blocks. It is a per-file metadata prop. 3. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? Files exist at the NN metadata (its fsimage/edits persist this). Blocks pertaining to a file exists at a DN. If the file had a single replica and that replica was lost, then the file's data is lost and the NameNode will tell you as much in its metrics/fsck. -- Harsh J
Re: how to make different mappers execute different processing on same data ?
Hi Jason, Mehmet said is exactly correct ,without reducers we cannot increase performance please you can add mappers and reducers in any processing data you can get output and performance is good. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 9:31 AM, Mehmet Tepedelenlioglu mehmets...@gmail.com wrote: If you have n processes to evaluate, make a reducer that calls the process i when it receives key i, 1=i=n. Either replicate the data for the n reducers, or cache it for it to be read on the reducer side. The reducers will output the process id i and the performance. On Sep 10, 2012, at 8:25 PM, Jason Yang wrote: Hi, all I've got a question about how to make different mappers execute different processing on a same data? Here is my scenario: I got to process a data, however, there multiple choices to process this data and I have no idea which one is better, so I was thinking that maybe I could execute multiple mappers, in which different processing solution is applied, and eventually the best one is chosen according to some evaluation functions. But I'm not sure whether this could be done in MapReduce. Any help would be appreciated. -- YANG, Lin
Re: Non utf-8 chars in input
Rekha, I guess that problem is that Text class uses utf-8 encoding and one can not set other encoding for this class. I have not seen any other Text like class which supports other encoding otherwise I have written my custom input format class. Thanks for your inputs. Regards, Ajay Srivastava On 11-Sep-2012, at 1:31 PM, Joshi, Rekha wrote: Actually even if that works, it does not seem an ideal solution. I think format and encoding are distinct, and enforcing format must not enforce an encoding.So that means there must be a possibility to pass encoding as a user choice on construction, e.g.:TextInputFormat(your-encoding). But I do not see that in api, so even if I extend InputFormat/RecordReader, I will not be able to have a feature of setEncoding() on my file format.Having that would be a good solution. Thanks Rekha On 11/09/12 12:37 PM, Joshi, Rekha rekha_jo...@intuit.com wrote: Hi Ajay, Try SequenceFileAsBinaryInputFormat ? Thanks Rekha On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote: Hi, I am using default inputFormat class for reading input from text files but the input file has some non utf-8 characters. I guess that TextInputFormat class is default inputFormat class and it replaces these non utf-8 chars by \uFFFD. If I do not want this behavior and need actual char in my mapper what should be the correct inputFormat class ? Regards, Ajay Srivastava
Re: configure hadoop-0.22 fairscheduler
Hi Harsh, Thanks for your reply. And I am sorry for my unclear description. As I mentioned previous, I think I configured the fairsheduler correctly in hadoop-0.22.0. But when I commit lots of the jobs: many big jobs (map number and reduce number is bigger than the map/reduce slot) commit first. and many small jobs(just 1-2map/reduce per job) commit later. And I find in the jobtracker http page that it used the JobQueueTaskScheduler, and even the http://jobtracker:port/scheduler page is not found. When the big jobs is running, the small job can't start before big job' complete. So I guess hadoop-0.22 do not support fairscheduler or somewhere I configured wrong. 专注于Mysql,MSSQL,Oracle,Hadoop 2012/9/8 Harsh J ha...@cloudera.com Hey Jameson, When calling something inefficient, perhaps also share some details on how/why/what? How else would we know what you wish to see and what you're seeing instead? :) On Thu, Sep 6, 2012 at 2:47 PM, Jameson Li hovlj...@gmail.com wrote: I want to test version hadoop-0.22. But when configurate the fairescheduler, I have some troublesome. The fairscheduler is not efficient. And I have configured this items in the mapred-site.xml, and also I hava copy the fairscheduler jar file to the $HADOOP_HOME/lib: property namemapreduce.jobtracker.taskScheduler/name valueorg.apache.hadoop.mapred.FairScheduler/value /property property namemapred.fairscheduler.allocation.file/name valueconf/pools.xml/value /property property namemapred.fairscheduler.preemption/name valuetrue/value /property property namemapred.fairscheduler.assignmultiple/name valuetrue/value /property property namemapred.fairscheduler.poolnameproperty/name valuemapred.queue.name/value descriptionjob.set(mapred.queue.name,pool); // pool is set to either 'high' or 'low' /description /property property namemapred.queue.names/name valuedefault,aaa,bbb/value /property And the pools.xml in $HADOOP_HOME/conf 's content: ?xml version=1.0? allocations pool name=putindb minMaps72/minMaps minReduces16/minReduces maxRunningJobs20/maxRunningJobs weight3.0/weight minSharePreemptionTimeout60/minSharePreemptionTimeout /pool pool name=machinelearning minMaps9/minMaps minReduces2/minReduces maxRunningJobs10/maxRunningJobs weight2.0/weight minSharePreemptionTimeout60/minSharePreemptionTimeout /pool pool name=default minMaps9/minMaps minReduces2/minReduces maxRunningJobs10/maxRunningJobs weight1.0/weight minSharePreemptionTimeout60/minSharePreemptionTimeout /pool defaultMinSharePreemptionTimeout60/defaultMinSharePreemptionTimeout fairSharePreemptionTimeout60/fairSharePreemptionTimeout /allocations Can someone help me? 专注于Mysql,MSSQL,Oracle,Hadoop -- Harsh J
what happens when a datanode rejoins?
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: a) Some of the blocks it was managing are deleted/modified? b) The size of the blocks are now modified say from 64MB to 128MB? c) What if the block replication factor was one (yea not in most deployments but say in case) so does the namenode recreate a file once the datanode rejoins? Thanks, Mehul
RE: build failure - trying to build hadoop trunk checkout
Changing the hostname to lowercase fixed this particular problem - thanks for your replies. The build is failing elsewhere now, I'll post a new thread for that. Tony From: Tony Burton [mailto:tbur...@sportingindex.com] Sent: 10 September 2012 10:44 To: user@hadoop.apache.org Subject: RE: build failure - trying to build hadoop trunk checkout Thanks for the replies all. I'll investigate changing my hostname and report back. (Seems a bit hacky though - can someone explain using easy words why this happens in Kerberos?) Tony From: Vinod Kumar Vavilapalli [mailto:vino...@hortonworks.com] Sent: 06 September 2012 18:51 To: user@hadoop.apache.org Subject: Re: build failure - trying to build hadoop trunk checkout Never mind filing. I recalled that we debugged this issue long time back and cornered this down to problems with kerberos. See https://issues.apache.org/jira/browse/HADOOP-7988. Given that, Tony, changing your hostname seems to be the only option. Thanks, +Vinod On Sep 6, 2012, at 4:24 AM, Steve Loughran wrote: file this as a JIRA against hadoop-common, though your machine's hostname looks suspiciously close to breaking the RFC-952 rules on hostnames -steve On 5 September 2012 14:49, Tony Burton tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote: Hi, I'm following the steps at http://wiki.apache.org/hadoop/HowToContribute for building hadoop in preparation for submitting a patch. I've checked out the trunk, and when I run mvn test from the top-level directory, I get the following error (snipped for clarity): Results : Failed tests: testLocalHostNameForNullOrWild(org.apache.hadoop.security.TestSecurityUtil): expected:hdfs/tonyb-[Precision-WorkS]tation-390@REALM but was:hdfs/tonyb-[precision-works]tation-390@REALM The test in question is: @Test public void testLocalHostNameForNullOrWild() throws Exception { String local = SecurityUtil.getLocalHostName(); assertEquals(hdfs/ + local + @REALM, SecurityUtil.getServerPrincipal(hdfs/_HOST@REALM, (String)null)); assertEquals(hdfs/ + local + @REALM, SecurityUtil.getServerPrincipal(hdfs/_HOST@REALM, 0.0.0.0)); } And my hostname is tonyb-Precision-WorkStation-390. Any idea how I can get this test to pass? Thanks, Tony ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. Outbound email has been scanned for viruses and SPAM Inbound Email has been scanned for viruses and SPAM P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. Inbound Email has been scanned for viruses and SPAM
Re: Can't run PI example on hadoop 0.23.1
Hi Vinod, Please check whether input file location and output file location doesnt match. please find your input file first put into HDFS and then run MR job it is working fine. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 4:23 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: The AM corresponding to your MR job is failing continuously. Can you check the container logs for your AM ? They should be in ${yarn.nodemanager.log-dirs}/${application-id}/container_[0-9]*_0001_01_01/stderr Thanks, +Vinod On Sep 10, 2012, at 3:19 PM, Smarty Juice wrote: Hello Champions, I am a newbie to hadoop and trying to run first hadoop example : PI , but I get the FIleNotFoundException, I do not know why its looking for this file on hdfs, anyone pls help Thank you, bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-0.23.1.jar pi - Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-0.23.1.jar 16 1 12/09/06 19:56:10 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used Number of Maps = 16 Samples per Map = 1 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Starting Job 12/09/06 19:56:22 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS 12/09/06 19:56:22 INFO input.FileInputFormat: Total input paths to process : 16 12/09/06 19:56:23 INFO mapreduce.JobSubmitter: number of splits:16 12/09/06 19:56:25 INFO mapred.ResourceMgrDelegate: Submitted application application_1346972652940_0002 to ResourceManager at /0.0.0.0:8040 12/09/06 19:56:27 INFO mapreduce.Job: The url to track the job: http://1.1.1.11:8088/proxy/application_1346972652940_0002/http://10.215.12.11:8088/proxy/application_1346972652940_0002/ 12/09/06 19:56:27 INFO mapreduce.Job: Running job: job_1346972652940_0002 12/09/06 19:56:55 INFO mapreduce.Job: Job job_1346972652940_0002 running in uber mode : false 12/09/06 19:56:55 INFO mapreduce.Job: map 0% reduce 0% 12/09/06 19:56:56 INFO mapreduce.Job: Job job_1346972652940_0002 failed with state FAILED due to: Application application_1346972652940_0002 failed 1 times due to AM Container for appattempt_1346972652940_0002_01 exited with exitCode: 1 due to: .Failing this attempt.. Failing the application. 12/09/06 19:56:56 INFO mapreduce.Job: Counters: 0 Job Finished in 35.048 seconds java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/user/X/QuasiMonteCarlo_TMP_3_141592654/out/reduce-out at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:729) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1685) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1709) at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314) at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:351) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69) at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:360) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144) at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:200)
Re: hadoop trunk build failure - yarn, surefire related?
Hi, Please find i think one command is there then only build the all applications. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton tbur...@sportingindex.comwrote: Hi, ** ** I’ve checked out the hadoop trunk, and I’m running “mvn test” on the codebase as part of following the “How To Contribute” guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below – something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! ** ** ** ** ** ** [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.229s [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012 [INFO] Final Memory: 21M/163M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-mapreduce-client-jobclient: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null: org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException* *** *P **Think of the environment: please don't print this email unless you really need to.* Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
RE: what happens when a datanode rejoins?
The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the datanode rejoins after namenode has already re-replicated the blocs it was managing? Will namenode ask the datanode to discard the blocks and start managing new blocks? Or will namenode discard the new blocks which were replicated due to unavailability of this datanode? Thanks, Mehul From: George Datskos [mailto:george.dats...@jp.fujitsu.com] Sent: Tuesday, September 11, 2012 12:56 PM To: user@hadoop.apache.org Subject: Re: what happens when a datanode rejoins? Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say from 64MB to 128MB? Block size is a per-file setting so new files will be 128MB, but the old ones will remain at 64MB. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? (assuming you didn't perform a decommission) Blocks that lived only on that datanode will be declared missing and the files associated with those blocks will be not be able to be fully read, until the datanode rejoins. George
Re: what happens when a datanode rejoins?
Hi, Inline. On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.com wrote: The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the datanode rejoins after namenode has already re-replicated the blocs it was managing? The block count total goes +1, and the file's block is treated as an over-replicated one. Will namenode ask the datanode to discard the blocks and start managing new blocks? Yes, this may happen. Or will namenode discard the new blocks which were replicated due to unavailability of this datanode? It deletes extra blocks while still keeping the block placement policy in mind. It may delete any block replica as long as the placement policy is not violated by doing so. -- Harsh J
RE: what happens when a datanode rejoins?
DataNode rejoins take care of only NameNode. Sorry didn't get this From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com] Sent: Tuesday, September 11, 2012 2:38 PM To: user@hadoop.apache.org Subject: Re: what happens when a datanode rejoins? Hi Mehul, DataNode rejoins take care of only NameNode. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.commailto:mehul_cho...@symantec.com wrote: The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the datanode rejoins after namenode has already re-replicated the blocs it was managing? Will namenode ask the datanode to discard the blocks and start managing new blocks? Or will namenode discard the new blocks which were replicated due to unavailability of this datanode? Thanks, Mehul From: George Datskos [mailto:george.dats...@jp.fujitsu.commailto:george.dats...@jp.fujitsu.com] Sent: Tuesday, September 11, 2012 12:56 PM To: user@hadoop.apache.orgmailto:user@hadoop.apache.org Subject: Re: what happens when a datanode rejoins? Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say from 64MB to 128MB? Block size is a per-file setting so new files will be 128MB, but the old ones will remain at 64MB. What if the block replication factor was one (yea not in most deployments but say incase) so does the namenode recreate a file once the datanode rejoins? (assuming you didn't perform a decommission) Blocks that lived only on that datanode will be declared missing and the files associated with those blocks will be not be able to be fully read, until the datanode rejoins. George
Re: Undeliverable messages
Ha, good sleuthing. I just moved it to INFRA, as no one from our side has gotten to this yet. I guess we can only moderate, not administrate. So the ticket now awaits action from INFRA on ejecting it out. On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on from Sungard, so Outlook's filtering rules will look after his list bounce messages from now on :) -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 10 September 2012 12:11 To: user@hadoop.apache.org Subject: Re: Undeliverable messages I'd filed this as much, but unsure how to get it done: https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on the mailing lists. On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com wrote: Sorry for admin-only content: can we remove this address from the list? I get the bounce message below whenever I post to user@hadoop.apache.org. Thanks! Tony _ From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com] Sent: 10 September 2012 10:45 To: Tony Burton Subject: Undeliverable: build failure - trying to build hadoop trunk checkout Delivery has failed to these recipients or distribution lists: NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com An error occurred while trying to deliver this message to the recipient's e-mail address. Microsoft Exchange will not try to redeliver this message for you. Please try resending this message, or provide the following diagnostic text to your system administrator. Diagnostic information for administrators: Generating server: SUNGARD NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com # #5.0.0 X-Notes;The message could not be delivered because the recipient's destination email system is unknown or invalid. Please check the address and try again, or contact your system administrator to verify connectivity to the email system of #SMTP# Original message headers: To: user@hadoop.apache.org CC: Reply-To: user@hadoop.apache.org X-Priority: 3 (Normal) Importance: Normal Date: Mon, 10 Sep 2012 05:43:30 -0400 From: tbur...@sportingindex.com Subject: RE: build failure - trying to build hadoop trunk checkout Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release 8.5.2FP2|March 22, 2011) at 09/10/2012 05:44:23 AM, Serialize complete at 09/10/2012 05:44:23 AM MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=_=_NextPart_001_01CD8F38.BC144D00; charset=iso-8859-1 P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. -- Harsh J www.sportingindex.com Inbound Email has been scanned for viruses and SPAM ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and
RE: hadoop trunk build failure - yarn, surefire related?
Hi Ramesh Thanks for the quick reply, but I'm having trouble following your English. Are you saying that there is one command to build everything? If so, can you tell me what it is? Tony From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com] Sent: 11 September 2012 10:06 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? Hi, Please find i think one command is there then only build the all applications. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote: Hi, I've checked out the hadoop trunk, and I'm running mvn test on the codebase as part of following the How To Contribute guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below - something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.229s [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012 [INFO] Final Memory: 21M/163M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-mapreduce-client-jobclient: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null: org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. Inbound Email has been scanned for viruses and SPAM ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by
Re: Undeliverable messages
And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony! On Tue, Sep 11, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote: Ha, good sleuthing. I just moved it to INFRA, as no one from our side has gotten to this yet. I guess we can only moderate, not administrate. So the ticket now awaits action from INFRA on ejecting it out. On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on from Sungard, so Outlook's filtering rules will look after his list bounce messages from now on :) -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 10 September 2012 12:11 To: user@hadoop.apache.org Subject: Re: Undeliverable messages I'd filed this as much, but unsure how to get it done: https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on the mailing lists. On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com wrote: Sorry for admin-only content: can we remove this address from the list? I get the bounce message below whenever I post to user@hadoop.apache.org. Thanks! Tony _ From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com] Sent: 10 September 2012 10:45 To: Tony Burton Subject: Undeliverable: build failure - trying to build hadoop trunk checkout Delivery has failed to these recipients or distribution lists: NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com An error occurred while trying to deliver this message to the recipient's e-mail address. Microsoft Exchange will not try to redeliver this message for you. Please try resending this message, or provide the following diagnostic text to your system administrator. Diagnostic information for administrators: Generating server: SUNGARD NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com # #5.0.0 X-Notes;The message could not be delivered because the recipient's destination email system is unknown or invalid. Please check the address and try again, or contact your system administrator to verify connectivity to the email system of #SMTP# Original message headers: To: user@hadoop.apache.org CC: Reply-To: user@hadoop.apache.org X-Priority: 3 (Normal) Importance: Normal Date: Mon, 10 Sep 2012 05:43:30 -0400 From: tbur...@sportingindex.com Subject: RE: build failure - trying to build hadoop trunk checkout Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release 8.5.2FP2|March 22, 2011) at 09/10/2012 05:44:23 AM, Serialize complete at 09/10/2012 05:44:23 AM MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=_=_NextPart_001_01CD8F38.BC144D00; charset=iso-8859-1 P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. -- Harsh J www.sportingindex.com Inbound Email has been scanned for viruses and SPAM ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility
Re: hadoop trunk build failure - yarn, surefire related?
It's probably some maven thing -in particular Maven's habit of grabbing the online nightly snapshots off apache rather than local, try mvn clean install -DskipTests -offline to force in all the artifacts, then run the MR tests Tony -why not get on the mapreduce-dev mailing list, as this is the place to discuss build and test process. On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote: Hi, ** ** I’ve checked out the hadoop trunk, and I’m running “mvn test” on the codebase as part of following the “How To Contribute” guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below – something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! ** ** ** ** ** ** [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.229s [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012 [INFO] Final Memory: 21M/163M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-mapreduce-client-jobclient: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null: org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException* *** *P **Think of the environment: please don't print this email unless you really need to.* Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd.
RE: hadoop trunk build failure - yarn, surefire related?
Thanks Steve, I’ll try the mvn command you suggest. All the snapshots I can see came from repository.apache.org though. How do I run the MR tests only? Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had merged into one though – did I get the wrong end of the stick? Tony From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: 11 September 2012 10:45 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? It's probably some maven thing -in particular Maven's habit of grabbing the online nightly snapshots off apache rather than local, try mvn clean install -DskipTests -offline to force in all the artifacts, then run the MR tests Tony -why not get on the mapreduce-dev mailing list, as this is the place to discuss build and test process. On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.commailto:tbur...@sportingindex.com wrote: Hi, I’ve checked out the hadoop trunk, and I’m running “mvn test” on the codebase as part of following the “How To Contribute” guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below – something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.229s [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012 [INFO] Final Memory: 21M/163M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-mapreduce-client-jobclient: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null: org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. Inbound Email has been scanned for viruses and SPAM ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose
RE: Undeliverable messages
No problem! I'll remove that Outlook filter now... :) -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 10:34 To: user@hadoop.apache.org Subject: Re: Undeliverable messages And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony! On Tue, Sep 11, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote: Ha, good sleuthing. I just moved it to INFRA, as no one from our side has gotten to this yet. I guess we can only moderate, not administrate. So the ticket now awaits action from INFRA on ejecting it out. On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks Harsh. By the looks of Stanislav's linkedin profile, he's moved on from Sungard, so Outlook's filtering rules will look after his list bounce messages from now on :) -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 10 September 2012 12:11 To: user@hadoop.apache.org Subject: Re: Undeliverable messages I'd filed this as much, but unsure how to get it done: https://issues.apache.org/jira/browse/HADOOP-8062. I'm not an admin on the mailing lists. On Mon, Sep 10, 2012 at 3:16 PM, Tony Burton tbur...@sportingindex.com wrote: Sorry for admin-only content: can we remove this address from the list? I get the bounce message below whenever I post to user@hadoop.apache.org. Thanks! Tony _ From: postmas...@sas.sungardrs.com [mailto:postmas...@sas.sungardrs.com] Sent: 10 September 2012 10:45 To: Tony Burton Subject: Undeliverable: build failure - trying to build hadoop trunk checkout Delivery has failed to these recipients or distribution lists: NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com An error occurred while trying to deliver this message to the recipient's e-mail address. Microsoft Exchange will not try to redeliver this message for you. Please try resending this message, or provide the following diagnostic text to your system administrator. Diagnostic information for administrators: Generating server: SUNGARD NOTES:Stanislav_Seltser/SSP/SUNGARD%SSP-Bedford@sas.sungardrs.com # #5.0.0 X-Notes;The message could not be delivered because the recipient's destination email system is unknown or invalid. Please check the address and try again, or contact your system administrator to verify connectivity to the email system of #SMTP# Original message headers: To: user@hadoop.apache.org CC: Reply-To: user@hadoop.apache.org X-Priority: 3 (Normal) Importance: Normal Date: Mon, 10 Sep 2012 05:43:30 -0400 From: tbur...@sportingindex.com Subject: RE: build failure - trying to build hadoop trunk checkout Message-ID: of32a4aa74.63265ba7-on85257a75.00358...@sas.sungardrs.com X-MIMETrack: Serialize by Router on Notes9/Corporate/SunGard(Release 8.5.2FP2|March 22, 2011) at 09/10/2012 05:44:23 AM, Serialize complete at 09/10/2012 05:44:23 AM MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=_=_NextPart_001_01CD8F38.BC144D00; charset=iso-8859-1 P Think of the environment: please don't print this email unless you really need to. Outbound Email has been scanned for viruses and SPAM This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts responsibility for any virus, or any other defect which might affect any computer or IT system into which the email is received and/or opened. It is the responsibility of the recipient to scan the email and no responsibility is accepted for any loss or damage arising in any way from receipt or use of this email. Sporting Index Ltd is a company registered in England and Wales with company number 2636842, whose registered office is at Gateway House, Milverton Street, London, SE11 4AP. Sporting Index Ltd is authorised and regulated by the UK Financial Services Authority (reg. no. 150404) and Gambling Commission (reg. no. 000-027343-R-308898-001). Any financial promotion contained herein has been issued and approved by Sporting Index Ltd. -- Harsh J www.sportingindex.com Inbound Email has been scanned for viruses and SPAM ** This email and any attachments are confidential, protected by copyright and may be legally privileged. If you are not the intended recipient, then the dissemination or copying of this email is prohibited. If you have received this in error, please notify the sender by replying by email and then delete the email completely from your system. Neither Sporting Index nor the sender accepts
RE: hadoop trunk build failure - yarn, surefire related?
Good suggestions Harsh and Hemanth. When I was asked to submit a patch for hadoop 1.0.3, I thought it a good exercise to work through the build process to become familiar even though the patch is documentation-only. Maybe the requests for patches could come with a list of suggested reading as well as the link to http://wiki.apache.org/hadoop/HowToContribute, and the contributer can distill the appropriate important steps from all three. Otherwise, it's hard to tell which is the gospel version. Tony -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:59 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? I guess we'll need to clean that guide and divide it in two - For branch-1 maintenance contributors, and for trunk contributors. I had another page that serves a slightly different purpose, but may help you just the same: http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk On Tue, Sep 11, 2012 at 4:09 PM, Tony Burton tbur...@sportingindex.com wrote: That's good to know - is there a more up to date guide than http://wiki.apache.org/hadoop/HowToContribute which still makes many references to ant builds? -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:36 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? Ah there is no longer a need to run ant anymore on trunk. Ignore the leftover files in the base MR project directory - those should be cleaned up soon. All of the functional pieces of the build right now definitely use Maven. On Tue, Sep 11, 2012 at 4:01 PM, Tony Burton tbur...@sportingindex.com wrote: Ok - thanks Harsh. Following Steve's earlier advice I tried the mvn install, which worked fine, then ant test in hadoop-mapreduce-project which failed. I was mid-email to mapreduce-dev@hadoop, now I'll try mvn test in hadoop-mapreduce-project and report back. Tony -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:25 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? Tony, What I do is: $ cd hadoop/; mvn install -DskipTests; cd hadoop-mapreduce-project/; mvn test This seems to work in running without any missing dependencies at least. The user lists are all merged, but the developer lists remain separate as that works better. On Tue, Sep 11, 2012 at 3:24 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks Steve, I'll try the mvn command you suggest. All the snapshots I can see came from repository.apache.org though. How do I run the MR tests only? Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had merged into one though - did I get the wrong end of the stick? Tony From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: 11 September 2012 10:45 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? It's probably some maven thing -in particular Maven's habit of grabbing the online nightly snapshots off apache rather than local, try mvn clean install -DskipTests -offline to force in all the artifacts, then run the MR tests Tony -why not get on the mapreduce-dev mailing list, as this is the place to discuss build and test process. On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote: Hi, I've checked out the hadoop trunk, and I'm running mvn test on the codebase as part of following the How To Contribute guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below - something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 17.229s [INFO] Finished at: Tue Sep 11 09:53:44 BST 2012 [INFO] Final Memory: 21M/163M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test (default-test) on project hadoop-mapreduce-client-jobclient: Execution default-test of goal org.apache.maven.plugins:maven-surefire-plugin:2.12:test failed: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null: org/hadoop/yarn/client/YarnClientImpl: org.hadoop.yarn.client.YarnClientImpl - [Help 1]
Re: what's the default reducer number?
Hi Lin The default value for number of reducers is 1 namemapred.reduce.tasks/name value1/value It is not determined by data volume. You need to specify the number of reducers for your mapreduce jobs as per your data volume. Regards Bejoy KS On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang lin.yang.ja...@gmail.comwrote: Hi, all I was wondering what's the default number of reducer if I don't set it in configuration? Will it change dynamically according to the output volume of Mapper? -- YANG, Lin
Re: FW: Doubts Reg
Dear Madam, I'am keeping as attachment relevant screen shots of running hive Profile.png shows contents of /etc/profile hive_common_lib.png shows h-ve_common*.jar is already in $HIVE_HOME/lib , here $HIVE_HOME is /home/yahoo/hive/build/dist as evident from classpath_err.png Yours Truly G Sudha --- On Mon, 9/10/12, sowmya@nokia.com sowmya@nokia.com wrote: From: sowmya@nokia.com sowmya@nokia.com Subject: FW: Doubts Reg To: sudhasadhasi...@yahoo.com Cc: chidambara...@nokia.com Date: Monday, September 10, 2012, 9:53 AM Hi Mam, Please let us know if the below solution works. Thanks Regards, Sowmya Samuel From: Bhagtani Chandra (Nokia-LC/Bangalore) Sent: Monday, September 10, 2012 9:52 AM To: Talanki Shobanbabu (Nokia-LC/Bangalore); Nayak Santhosh (Nokia-LC/Bangalore) Cc: B Sowmya.1 (Nokia-LC/Bangalore) Subject: Re: Doubts Reg Hi Sowmya, This is a classpath problem. They have to check whether hive-common-*.jar is in hive home's lib directory. Thanks, Chandra From: Shoban shobanbabu.tala...@nokia.com Date: Monday, September 10, 2012 9:45 AM To: Bhagtani Chandra Prakash chandra.bhagt...@nokia.com, Nayak Santhosh (Nokia-LC/Bangalore) santhosh.na...@nokia.com Subject: FW: Doubts Reg Do you know what needs to be done here? This is from Univ. guys From: B Sowmya.1 (Nokia-LC/Bangalore) Sent: Monday, September 10, 2012 9:44 AM To: Talanki Shobanbabu (Nokia-LC/Bangalore) Subject: FW: Doubts Reg Hi Shoban, Can someone please provide a solution to this error if possible Thanks Regards, Sowmya Samuel From: ext sudha sadhasivam [mailto:sudhasadhasi...@yahoo.com] Sent: Saturday, September 08, 2012 2:23 PM To: B Sowmya.1 (Nokia-LC/Bangalore) Cc: K Chidambaran (Nokia-LC/Bangalore) Subject: Fw: Doubts Reg Sir WE have doubts reg hive installation We are herewith attaching screenshots for the same path file for conf is not set Can you give suggestions for the same Thanking you Dr G sudha Sadasivam
Re: what's the default reducer number?
Hi Lin The default values for all the properties are in core-default.xml hdfs-default.xml and mapred-default.xml Regards Bejoy KS On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang lin.yang.ja...@gmail.comwrote: Hi, Bejoy Thanks for you reply. where could I find the default value of mapred.reduce.tasks ? I have checked the core-site.xml, hdfs-site.xml and mapred-site.xml, but I haven't found it. 2012/9/11 Bejoy Ks bejoy.had...@gmail.com Hi Lin The default value for number of reducers is 1 namemapred.reduce.tasks/name value1/value It is not determined by data volume. You need to specify the number of reducers for your mapreduce jobs as per your data volume. Regards Bejoy KS On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang lin.yang.ja...@gmail.comwrote: Hi, all I was wondering what's the default number of reducer if I don't set it in configuration? Will it change dynamically according to the output volume of Mapper? -- YANG, Lin -- YANG, Lin
Re: Some general questions about DBInputFormat
Hi Yaron Sqoop uses a similar implementation. You can get some details there. Replies inline • (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? From my small experience Most MR jobs have data in hdfs. It is useful for getting data out of rdbms to hadoop, sqoop implemenation is an example. • Since all mappers open a connection to the same DBS, one cannot use hundreds of mapper. Is there a solution to this problem? Num of mappers shouldn't be more than the permissible number of connections allowed for that db. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: Yaron Gonen yaron.go...@gmail.com Date: Tue, 11 Sep 2012 15:41:26 To: user@hadoop.apache.org Reply-To: user@hadoop.apache.org Subject: Some general questions about DBInputFormat Hi, After reviewing the class's (not very complicated) code, I have some questions I hope someone can answer: - (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? - What happens when the database is updated during mappers' data retrieval phase? is there a way to lock the database before the data retrieval phase and release it afterwords? - Since all mappers open a connection to the same DBS, one cannot use hundreds of mapper. Is there a solution to this problem? Thanks, Yaron
Question about the task assignment strategy
Hi, I want to make sure my understanding about task assignment in hadoop is correct or not. When scanning a file with multiple tasktrackers, I am wondering how a task is assigned to each tasktracker . Is it based on the block sequence or data locality ? Let me explain my question by example. There is a file which composed of 10 blocks (block1 to block10), and block1 is the beginning of the file and block10 is the tail of the file. When scanning the file with 3 tasktrackers (tt1 to tt3), I am wondering if task assignment is based on the block sequence like first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and tt1 takes block4 and so on or task assignment is based on the task(data) locality like first tt1 takes block2(because it's located in the local) and tt2 takes block1 (because it's located in the local) and tt3 takes block 4(because it's located in the local) and so on. As far as I experienced and the definitive guide book says, I think that the first case is the task assignment strategy. (and if there are many replicas, closest one is picked.) Is this right ? If this is right, is there any way to do like the second case with the current implementation ? Thanks, Hiroyuki
Re: Question about the task assignment strategy
Hi, Task assignment takes data locality into account first and not block sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks. When such a request comes to the jobtracker, it will try to look for an unassigned task which needs data that is close to the tasktracker and will assign it. Thanks Hemanth On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, I want to make sure my understanding about task assignment in hadoop is correct or not. When scanning a file with multiple tasktrackers, I am wondering how a task is assigned to each tasktracker . Is it based on the block sequence or data locality ? Let me explain my question by example. There is a file which composed of 10 blocks (block1 to block10), and block1 is the beginning of the file and block10 is the tail of the file. When scanning the file with 3 tasktrackers (tt1 to tt3), I am wondering if task assignment is based on the block sequence like first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and tt1 takes block4 and so on or task assignment is based on the task(data) locality like first tt1 takes block2(because it's located in the local) and tt2 takes block1 (because it's located in the local) and tt3 takes block 4(because it's located in the local) and so on. As far as I experienced and the definitive guide book says, I think that the first case is the task assignment strategy. (and if there are many replicas, closest one is picked.) Is this right ? If this is right, is there any way to do like the second case with the current implementation ? Thanks, Hiroyuki
RE: hadoop trunk build failure - yarn, surefire related?
Another mvn test caused the build to fail slightly further down the road. As my Jira issue is documentation-only, I've submitted the patch anyway. Is this multiple-failure scenario typical for trying to build hadoop from the trunk? It's sure putting me off submitting code in future. Is there any way the build process could be reconsidered such that there's only one source of instruction for how to successfully build the project? Tony -Original Message- From: Tony Burton [mailto:tbur...@sportingindex.com] Sent: 11 September 2012 12:15 To: user@hadoop.apache.org Subject: RE: hadoop trunk build failure - yarn, surefire related? Good suggestions Harsh and Hemanth. When I was asked to submit a patch for hadoop 1.0.3, I thought it a good exercise to work through the build process to become familiar even though the patch is documentation-only. Maybe the requests for patches could come with a list of suggested reading as well as the link to http://wiki.apache.org/hadoop/HowToContribute, and the contributer can distill the appropriate important steps from all three. Otherwise, it's hard to tell which is the gospel version. Tony -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:59 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? I guess we'll need to clean that guide and divide it in two - For branch-1 maintenance contributors, and for trunk contributors. I had another page that serves a slightly different purpose, but may help you just the same: http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk On Tue, Sep 11, 2012 at 4:09 PM, Tony Burton tbur...@sportingindex.com wrote: That's good to know - is there a more up to date guide than http://wiki.apache.org/hadoop/HowToContribute which still makes many references to ant builds? -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:36 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? Ah there is no longer a need to run ant anymore on trunk. Ignore the leftover files in the base MR project directory - those should be cleaned up soon. All of the functional pieces of the build right now definitely use Maven. On Tue, Sep 11, 2012 at 4:01 PM, Tony Burton tbur...@sportingindex.com wrote: Ok - thanks Harsh. Following Steve's earlier advice I tried the mvn install, which worked fine, then ant test in hadoop-mapreduce-project which failed. I was mid-email to mapreduce-dev@hadoop, now I'll try mvn test in hadoop-mapreduce-project and report back. Tony -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 11:25 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? Tony, What I do is: $ cd hadoop/; mvn install -DskipTests; cd hadoop-mapreduce-project/; mvn test This seems to work in running without any missing dependencies at least. The user lists are all merged, but the developer lists remain separate as that works better. On Tue, Sep 11, 2012 at 3:24 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks Steve, I'll try the mvn command you suggest. All the snapshots I can see came from repository.apache.org though. How do I run the MR tests only? Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had merged into one though - did I get the wrong end of the stick? Tony From: Steve Loughran [mailto:ste...@hortonworks.com] Sent: 11 September 2012 10:45 To: user@hadoop.apache.org Subject: Re: hadoop trunk build failure - yarn, surefire related? It's probably some maven thing -in particular Maven's habit of grabbing the online nightly snapshots off apache rather than local, try mvn clean install -DskipTests -offline to force in all the artifacts, then run the MR tests Tony -why not get on the mapreduce-dev mailing list, as this is the place to discuss build and test process. On 11 September 2012 09:58, Tony Burton tbur...@sportingindex.com wrote: Hi, I've checked out the hadoop trunk, and I'm running mvn test on the codebase as part of following the How To Contribute guide at http://wiki.apache.org/hadoop/HowToContribute. The tests are currently failing in hadoop-mapreduce-client-jobclient, the failure message is below - something to do with org.hadoop.yarn.client and/or the maven surefire plugin. Can anyone suggest how to proceed? Thanks! [INFO] [INFO] Reactor Summary: [INFO] [snipped SKIPPED test groups] [INFO] hadoop-mapreduce-client-jobclient . FAILURE [13.988s] [INFO] [INFO] [INFO] BUILD FAILURE [INFO]
Re: Error in : hadoop fsck /
Could you please review your configuration to see if you are pointing to the right namenode address ? (This will be in core-site.xml) Please paste it here so we can look for clues. Thanks hemanth On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote: Hi all, I am running hadoop-0.20.2 on single node cluster, I run the command hadoop fsck / it shows error: Exception in thread main java.net.UnknownHostException: http at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:378) at sun.net.www.http.HttpClient.openServer(HttpClient.java:473) at sun.net.www.http.HttpClient.init(HttpClient.java:203) at sun.net.www.http.HttpClient.New(HttpClient.java:290) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1299) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:123) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:159) Please suggest why it so.. it should show the health status:
Re: Error in : hadoop fsck /
Yogesh try this hadoop fsck -Ddfs.http.address=localhost:50070 / 50070 is the default http port that the namenode runs on. The property dfs.http.address should be set in your hdfs-site.xml -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Sep 11, 2012, at 9:03 AM, yogesh dhari yogeshdh...@live.com wrote: Hi Hemant, Its the content of core-site.xml configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property property namehadoop.tmp.dir/name value/opt/hadoop-0.20.2/hadoop_temporary_dirr/value descriptionA base for other temporary directories./description /property /configuration Regards Yogesh Kumar Date: Tue, 11 Sep 2012 21:29:36 +0530 Subject: Re: Error in : hadoop fsck / From: yhema...@thoughtworks.com To: user@hadoop.apache.org Could you please review your configuration to see if you are pointing to the right namenode address ? (This will be in core-site.xml) Please paste it here so we can look for clues. Thanks hemanth On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote: Hi all, I am running hadoop-0.20.2 on single node cluster, I run the command hadoop fsck / it shows error: Exception in thread main java.net.UnknownHostException: http at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:178) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:391) at java.net.Socket.connect(Socket.java:579) at java.net.Socket.connect(Socket.java:528) at sun.net.NetworkClient.doConnect(NetworkClient.java:180) at sun.net.www.http.HttpClient.openServer(HttpClient.java:378) at sun.net.www.http.HttpClient.openServer(HttpClient.java:473) at sun.net.www.http.HttpClient.init(HttpClient.java:203) at sun.net.www.http.HttpClient.New(HttpClient.java:290) at sun.net.www.http.HttpClient.New(HttpClient.java:306) at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:995) at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:931) at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:849) at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1299) at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:123) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:159) Please suggest why it so.. it should show the health status:
how to specify the root directory of hadoop on slave node?
Hi, All I need to setup a hadoop/hdfs cluster with one namenode on a machine and two datanodes on two other machines. But after setting datanode machiines in conf/slaves file, running bin/start-dfs.sh can not start hdfs normally.. I am aware that I have not specify the root directory hadoop is installed on slave nodes and the OS user account to use hadoop on slave node. I am asking how to specify where hadoop/hdfs is locally installed on slave node? Also how to specify the user account to start hdfs there? Regards, Richard
security-in-HADOOP
How security is maintained in hadoop, is it maintained by giving folder/file permissions in hadoop how can i make sure that somebody else dunt write in to my hdfs file system ...
removing datanodes from clustes.
Hello all, I am not getting the clear way out to remove datanode from the cluster. please explain me decommissioning steps with example. like how to creating exclude files and other steps involved in it. Thanks regards Yogesh Kumar
Re: security-in-HADOOP
By reading the documentation, like the following http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html On Tue, Sep 11, 2012 at 8:14 PM, nisha nishakulkarn...@gmail.com wrote: How security is maintained in hadoop, is it maintained by giving folder/file permissions in hadoop how can i make sure that somebody else dunt write in to my hdfs file system ...
Re: How to remove datanode from cluster..
Hi Yogesh The detailed steps are available in hadoop wiki on FAQ page http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F Regrads Bejoy KS On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari yogeshdh...@live.com wrote: Hello all, I am not getting the clear way out to remove datanode from the cluster. please explain me decommissioning steps with example. like how to creating exclude files and other steps involved in it. Thanks regards Yogesh Kumar
Re: Question about the task assignment strategy
I figured out the cause. HDFS block size is 128MB, but I specify mapred.min.split.size as 512MB, and data local I/O processing goes wrong for some reason. When I remove the mapred.min.split.size configuration, tasktrackers pick data-local tasks. Why does it happen ? It seems like a bug. Split is a logical container of blocks, so nothing is wrong logically. On Wed, Sep 12, 2012 at 1:20 AM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, thank you for the comment. Task assignment takes data locality into account first and not block sequence. Does it work like that when replica factor is set to 1 ? I just had a experiment to check the behavior. There are 14 nodes (node01 to node14) and there are 14 datanodes and 14 tasktrackers working. I first created a data to be processed in each node (say data01 to data14), and I put the each data to the hdfs from each node (at /data directory. /data/data01, ... /data/data14). Replica factor is set to 1, so according to the default block placement policy, each data is stored at local node. (data01 is stored at node01, data02 is stored at node02 and so on) In that setting, I launched a job that processes the /data and what happened is that tasktrackers read from data01 to data14 sequentially, which means tasktrackers first take all data from node01 and then node02 and then node03 and so on. If tasktracker takes data locality into account as you say, each tasktracker should take the local task(data). (tasktrackers at node02 should take data02 blocks if there is any) But, it didn't work like that. What this is happening ? Is there any documents about this ? What part of the source code is doing that ? Regards, Hiroyuki On Tue, Sep 11, 2012 at 11:27 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hi, Task assignment takes data locality into account first and not block sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks. When such a request comes to the jobtracker, it will try to look for an unassigned task which needs data that is close to the tasktracker and will assign it. Thanks Hemanth On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, I want to make sure my understanding about task assignment in hadoop is correct or not. When scanning a file with multiple tasktrackers, I am wondering how a task is assigned to each tasktracker . Is it based on the block sequence or data locality ? Let me explain my question by example. There is a file which composed of 10 blocks (block1 to block10), and block1 is the beginning of the file and block10 is the tail of the file. When scanning the file with 3 tasktrackers (tt1 to tt3), I am wondering if task assignment is based on the block sequence like first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and tt1 takes block4 and so on or task assignment is based on the task(data) locality like first tt1 takes block2(because it's located in the local) and tt2 takes block1 (because it's located in the local) and tt3 takes block 4(because it's located in the local) and so on. As far as I experienced and the definitive guide book says, I think that the first case is the task assignment strategy. (and if there are many replicas, closest one is picked.) Is this right ? If this is right, is there any way to do like the second case with the current implementation ? Thanks, Hiroyuki
Re: Accessing image files from hadoop to jsp
Here's one... Write a Java program which can be accessed on the server side to pull the picture from HDFS and display it on your JSP. On Sep 11, 2012, at 3:48 PM, Visioner Sadak visioner.sa...@gmail.com wrote: any hints experts atleast if i m on the right track or we cant use hftp at all coz the browser wont understand it? On Mon, Sep 10, 2012 at 1:58 PM, Visioner Sadak visioner.sa...@gmail.com wrote: or shud i use datanode ip for accessing images using hftp tdimg src =hftp://localhost:50075/Comb/java1.jpg//td my datanode is at 50075 On Sat, Sep 8, 2012 at 8:30 PM, Visioner Sadak visioner.sa...@gmail.com wrote: seems like webhdfs is available on above 1.0. versions,for the time being just to do a POC cant i use hftp in 0.22.0 version i have to just show the img on a jsp sumthing like this,any ideas on how to configure hftp,security and all,fully confused coz if the url is accessible as below any WAR can access it right or we will have to do it frm admin configs... tdimg src =hftp://localhost:50070/Comb/java1.jpg//td On Sat, Sep 8, 2012 at 6:47 PM, Harsh J ha...@cloudera.com wrote: I'd say yes. Use either WebHDFS (requires client to be able to access all HDFS nodes) or HttpFS (acts as a gateway if your client can't access all HDFS nodes) via their REST APIs. On Sat, Sep 8, 2012 at 3:48 PM, Visioner Sadak visioner.sa...@gmail.com wrote: is webhdfs a viable approach??? On Thu, Sep 6, 2012 at 8:26 PM, Visioner Sadak visioner.sa...@gmail.com wrote: I have uploaded some images to hdfs hadoop user/combo/ directory now want to show those images in a jsp i have configured tomcat and hadoop properly i m able to do uploads any ideas on how to build the interface to show thes images on to my front end jsp... -- Harsh J
How to split a sequence file
Hi, I have a sequence file written by SequenceFileOutputFormat with key/value type of Text, BytesWritable, like below: Text BytesWritable - id_A_01 7F2B3C687F2B3C687F2B3C68 id_A_02 2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7 id_A_03 5F2B3C68D77F2B3C687F2B3A ... id_B_01 1AB23C68D73C68D76AB23C68D73C68D7 id_B_02 5AB23C68D73C68D76AB68D76A1 id_B_03 F2B23C68D7B23C68D7B23C68D7 If I want all the records with the same key prefix to be processed by a same mapper, say records with key id_A_XX are processed by a mapper and records with key id_B_XX are processed by another mapper, what should I do? Should I implement our own InputFormat inherited from SequenceFileInputFormat ? Any help would be appreciated. -- YANG, Lin
Re: How to not output the key
Hi, You have to specify the reducer key out type as NullWritable. Cheers! Manoj. On Wed, Sep 12, 2012 at 7:43 AM, Nataraj Rashmi - rnatar rashmi.nata...@acxiom.com wrote: Hello, ** ** I have simple map/reduce program to merge input files into one big output files. My question is, is there a way not to output the key from the reducer to the output file? I only want the value, not the key for each record. ** ** Thanks ** ** *** The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system. Thank You.
RE: Issue in access static object in MapReduce
Thanks Bejoy, I try to implement and if face any issues will let you know. Thanks Stuti From: Bejoy Ks [mailto:bejoy.had...@gmail.com] Sent: Tuesday, September 11, 2012 8:39 PM To: user@hadoop.apache.org Subject: Re: Issue in access static object in MapReduce Hi Stuti You can pass the json object as a configuration property from your main class then Initialize this static json object on the configure() method. Every instance of map or reduce task will have this configure() method executed once before the map()/reduce() function . So all the executions of map()/reduce() on each record can have a look up into the json object. public void configure(JobConf job) Regards Bejoy KS On Tue, Sep 11, 2012 at 8:23 PM, Stuti Awasthi stutiawas...@hcl.commailto:stutiawas...@hcl.com wrote: Hi, I have a configuration JSON file which is accessed by MR job for every input.So , I created a class with a static block, load the JSON file in static Instance variable. So everytime my mapper or reducer wants to access configuration can use this Instance variable. But on a single node cluster,when I run this as a jar, my mapper is executing fine, But in reducer static Instance variable is retuning null. I get this mainly because mapper and reducer runs on separate jvms. How can I handle this situation gracefully. There are few ways that I can think of: 1. Serialize the JSON load object and store it on HDFS and then deserialize and use in the code. Problem is obj is not too large so not be useful to store on HDFS 2. I load the JSON for every Input line of map reduce call. This is highly inefficient and as Il be loading JSON for so many lines again and again. How to resolve this issue.. Any Inputs are welcome. Note , same code works with eclipse well. :( Thanks Stuti Regards, Stuti Awasthi HCL Comnet Systems and Services Ltd ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
RE: removing datanodes from clustes.
Hi Yogesh.. FYI. Please go through following.. http://tech.zhenhua.info/2011/04/how-to-decommission-nodesblacklist.html http://hadoop-karma.blogspot.in/2011/01/hadoop-cookbook-how-to-decommission.html From: yogesh dhari [yogeshdh...@live.com] Sent: Wednesday, September 12, 2012 2:16 AM To: user@hadoop.apache.org Subject: removing datanodes from clustes. Hello all, I am not getting the clear way out to remove datanode from the cluster. please explain me decommissioning steps with example. like how to creating exclude files and other steps involved in it. Thanks regards Yogesh Kumar
Re: Issue in access static object in MapReduce
Have you looked at Terracotta or any other distributed caching system? Kunal -- Sent while mobile -- On Sep 11, 2012, at 9:30 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Thanks Bejoy, I try to implement and if face any issues will let you know. Thanks Stuti From: Bejoy Ks [mailto:bejoy.had...@gmail.com] Sent: Tuesday, September 11, 2012 8:39 PM To: user@hadoop.apache.org Subject: Re: Issue in access static object in MapReduce Hi Stuti You can pass the json object as a configuration property from your main class then Initialize this static json object on the configure() method. Every instance of map or reduce task will have this configure() method executed once before the map()/reduce() function . So all the executions of map()/reduce() on each record can have a look up into the json object. public void configure(JobConf job) Regards Bejoy KS On Tue, Sep 11, 2012 at 8:23 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Hi, I have a configuration JSON file which is accessed by MR job for every input.So , I created a class with a static block, load the JSON file in static Instance variable. So everytime my mapper or reducer wants to access configuration can use this Instance variable. But on a single node cluster,when I run this as a jar, my mapper is executing fine, But in reducer static Instance variable is retuning null. I get this mainly because mapper and reducer runs on separate jvms. How can I handle this situation gracefully. There are few ways that I can think of: 1. Serialize the JSON load object and store it on HDFS and then deserialize and use in the code. Problem is obj is not too large so not be useful to store on HDFS 2. I load the JSON for every Input line of map reduce call. This is highly inefficient and as Il be loading JSON for so many lines again and again. How to resolve this issue.. Any Inputs are welcome. Note , same code works with eclipse well. :( Thanks Stuti Regards, Stuti Awasthi HCL Comnet Systems and Services Ltd ::DISCLAIMER:: The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects.
Re: How to split a sequence file
If the file is pre-sorted, why not just make multiple sequence files - 1 for each split? Then you don't have to compute InputSplits because the physical files are already split. On Tue, Sep 11, 2012 at 11:00 PM, Harsh J ha...@cloudera.com wrote: Hey Jason, Is the file pre-sorted? You could override the OutputFormat's #getSplits method to return InputSplits at identified key boundaries, as one solution - this would require reading the file up-front (at submit-time) and building the input splits out of it. On Wed, Sep 12, 2012 at 8:45 AM, Jason Yang lin.yang.ja...@gmail.com wrote: Hi, I have a sequence file written by SequenceFileOutputFormat with key/value type of Text, BytesWritable, like below: Text BytesWritable - id_A_01 7F2B3C687F2B3C687F2B3C68 id_A_02 2F2B3C687F2B3C687F2B3C686AB23C68D73C68D7 id_A_03 5F2B3C68D77F2B3C687F2B3A ... id_B_01 1AB23C68D73C68D76AB23C68D73C68D7 id_B_02 5AB23C68D73C68D76AB68D76A1 id_B_03 F2B23C68D7B23C68D7B23C68D7 If I want all the records with the same key prefix to be processed by a same mapper, say records with key id_A_XX are processed by a mapper and records with key id_B_XX are processed by another mapper, what should I do? Should I implement our own InputFormat inherited from SequenceFileInputFormat ? Any help would be appreciated. -- YANG, Lin -- Harsh J
Re: Question about the task assignment strategy
Hi, I tried a similar experiment as yours but couldn't replicate the issue. I generated 64 MB files and added them to my DFS - one file from every machine, with a replication factor of 1, like you did. My block size was 64MB. I verified the blocks were located on the same machine as where I added them from. Then, I launched a wordcount (without the min split size config). As expected, it created 8 maps, and I could verify that it ran all the tasks as data local - i.e. every task read off its own datanode. From the launch times of the tasks, I could roughly feel that this scheduling behaviour was independent of the order in which the tasks were launched. This behaviour was retained even with the min split size config. Could you share the size of input you generated (i.e the size of data01..data14) ? Also, what job are you running - specifically what is the input format ? BTW, this wiki entry: http://wiki.apache.org/hadoop/HowManyMapsAndReducestalks a little bit about how the maps are created. Thanks Hemanth On Wed, Sep 12, 2012 at 7:49 AM, Hiroyuki Yamada mogwa...@gmail.com wrote: I figured out the cause. HDFS block size is 128MB, but I specify mapred.min.split.size as 512MB, and data local I/O processing goes wrong for some reason. When I remove the mapred.min.split.size configuration, tasktrackers pick data-local tasks. Why does it happen ? It seems like a bug. Split is a logical container of blocks, so nothing is wrong logically. On Wed, Sep 12, 2012 at 1:20 AM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, thank you for the comment. Task assignment takes data locality into account first and not block sequence. Does it work like that when replica factor is set to 1 ? I just had a experiment to check the behavior. There are 14 nodes (node01 to node14) and there are 14 datanodes and 14 tasktrackers working. I first created a data to be processed in each node (say data01 to data14), and I put the each data to the hdfs from each node (at /data directory. /data/data01, ... /data/data14). Replica factor is set to 1, so according to the default block placement policy, each data is stored at local node. (data01 is stored at node01, data02 is stored at node02 and so on) In that setting, I launched a job that processes the /data and what happened is that tasktrackers read from data01 to data14 sequentially, which means tasktrackers first take all data from node01 and then node02 and then node03 and so on. If tasktracker takes data locality into account as you say, each tasktracker should take the local task(data). (tasktrackers at node02 should take data02 blocks if there is any) But, it didn't work like that. What this is happening ? Is there any documents about this ? What part of the source code is doing that ? Regards, Hiroyuki On Tue, Sep 11, 2012 at 11:27 PM, Hemanth Yamijala yhema...@thoughtworks.com wrote: Hi, Task assignment takes data locality into account first and not block sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks. When such a request comes to the jobtracker, it will try to look for an unassigned task which needs data that is close to the tasktracker and will assign it. Thanks Hemanth On Tue, Sep 11, 2012 at 6:31 PM, Hiroyuki Yamada mogwa...@gmail.com wrote: Hi, I want to make sure my understanding about task assignment in hadoop is correct or not. When scanning a file with multiple tasktrackers, I am wondering how a task is assigned to each tasktracker . Is it based on the block sequence or data locality ? Let me explain my question by example. There is a file which composed of 10 blocks (block1 to block10), and block1 is the beginning of the file and block10 is the tail of the file. When scanning the file with 3 tasktrackers (tt1 to tt3), I am wondering if task assignment is based on the block sequence like first tt1 takes block1 and tt2 takes block2 and tt3 takes block3 and tt1 takes block4 and so on or task assignment is based on the task(data) locality like first tt1 takes block2(because it's located in the local) and tt2 takes block1 (because it's located in the local) and tt3 takes block 4(because it's located in the local) and so on. As far as I experienced and the definitive guide book says, I think that the first case is the task assignment strategy. (and if there are many replicas, closest one is picked.) Is this right ? If this is right, is there any way to do like the second case with the current implementation ? Thanks, Hiroyuki