Re: simple hadoop psuedo distr. mode instructions
great thanks Jagat ! On Fri, Mar 23, 2012 at 1:42 AM, Jagat wrote: > Hi Jay > > Just follow this to install > > http://jugnu-life.blogspot.in/2012/03/hadoop-installation-tutorial.html > > The official tutorial at link below is also useful > > http://hadoop.apache.org/common/docs/r1.0.1/single_node_setup.html > > Thanks > > Jagat > > On Fri, Mar 23, 2012 at 12:08 PM, Jay Vyas wrote: > > > Hi guys : What the latest, simplest, best directions to get a tiny, > > psuedodistributed hadoop setup running on my ubuntu machine ? > > > > On Wed, Mar 21, 2012 at 5:14 PM, wrote: > > > > > Owen, > > > > > > Is there interest in reverting hadoop-2399 in 0.23.x ? > > > > > > - Milind > > > > > > --- > > > Milind Bhandarkar > > > Greenplum Labs, EMC > > > (Disclaimer: Opinions expressed in this email are those of the author, > > and > > > do not necessarily represent the views of any organization, past or > > > present, the author might be affiliated with.) > > > > > > > > > > > > On 3/19/12 11:20 PM, "Owen O'Malley" wrote: > > > > > > >On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak > > > >wrote: > > > > > > > >> Hi Owen O'Malley, > > > >> Thank you for that Instant reply. It's working now. Can you explain > > me > > > >> what you mean by "input to reducer is reused" in little detail? > > > > > > > > > > > >Each time the statement "Text value = values.next();" is executed it > > > >always > > > >returns the same Text object with the contents of that object changed. > > > >When > > > >you add the Text to the list, you are adding a pointer to the same > Text > > > >object. At the end you have 6 copies of the same pointer instead of 6 > > > >different Text objects. > > > > > > > >The reason that I said it is my fault, is because I added the > > optimization > > > >that causes it. If you are interested in Hadoop archeology, it was > > > >HADOOP-2399 that made the change. I also did HADOOP-3522 to improve > the > > > >documentation in the area. > > > > > > > >-- Owen > > > > > > > > > > > > -- > > Jay Vyas > > MMSB/UCHC > > > -- Jay Vyas MMSB/UCHC
Re: Very strange Java Collection behavior in Hadoop
Hi Jay Just follow this to install http://jugnu-life.blogspot.in/2012/03/hadoop-installation-tutorial.html The official tutorial at link below is also useful http://hadoop.apache.org/common/docs/r1.0.1/single_node_setup.html Thanks Jagat On Fri, Mar 23, 2012 at 12:08 PM, Jay Vyas wrote: > Hi guys : What the latest, simplest, best directions to get a tiny, > psuedodistributed hadoop setup running on my ubuntu machine ? > > On Wed, Mar 21, 2012 at 5:14 PM, wrote: > > > Owen, > > > > Is there interest in reverting hadoop-2399 in 0.23.x ? > > > > - Milind > > > > --- > > Milind Bhandarkar > > Greenplum Labs, EMC > > (Disclaimer: Opinions expressed in this email are those of the author, > and > > do not necessarily represent the views of any organization, past or > > present, the author might be affiliated with.) > > > > > > > > On 3/19/12 11:20 PM, "Owen O'Malley" wrote: > > > > >On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak > > >wrote: > > > > > >> Hi Owen O'Malley, > > >> Thank you for that Instant reply. It's working now. Can you explain > me > > >> what you mean by "input to reducer is reused" in little detail? > > > > > > > > >Each time the statement "Text value = values.next();" is executed it > > >always > > >returns the same Text object with the contents of that object changed. > > >When > > >you add the Text to the list, you are adding a pointer to the same Text > > >object. At the end you have 6 copies of the same pointer instead of 6 > > >different Text objects. > > > > > >The reason that I said it is my fault, is because I added the > optimization > > >that causes it. If you are interested in Hadoop archeology, it was > > >HADOOP-2399 that made the change. I also did HADOOP-3522 to improve the > > >documentation in the area. > > > > > >-- Owen > > > > > > > -- > Jay Vyas > MMSB/UCHC >
Re: Very strange Java Collection behavior in Hadoop
Hi guys : What the latest, simplest, best directions to get a tiny, psuedodistributed hadoop setup running on my ubuntu machine ? On Wed, Mar 21, 2012 at 5:14 PM, wrote: > Owen, > > Is there interest in reverting hadoop-2399 in 0.23.x ? > > - Milind > > --- > Milind Bhandarkar > Greenplum Labs, EMC > (Disclaimer: Opinions expressed in this email are those of the author, and > do not necessarily represent the views of any organization, past or > present, the author might be affiliated with.) > > > > On 3/19/12 11:20 PM, "Owen O'Malley" wrote: > > >On Mon, Mar 19, 2012 at 11:05 PM, madhu phatak > >wrote: > > > >> Hi Owen O'Malley, > >> Thank you for that Instant reply. It's working now. Can you explain me > >> what you mean by "input to reducer is reused" in little detail? > > > > > >Each time the statement "Text value = values.next();" is executed it > >always > >returns the same Text object with the contents of that object changed. > >When > >you add the Text to the list, you are adding a pointer to the same Text > >object. At the end you have 6 copies of the same pointer instead of 6 > >different Text objects. > > > >The reason that I said it is my fault, is because I added the optimization > >that causes it. If you are interested in Hadoop archeology, it was > >HADOOP-2399 that made the change. I also did HADOOP-3522 to improve the > >documentation in the area. > > > >-- Owen > > -- Jay Vyas MMSB/UCHC
IBM China Big Data team recruitment
Please send you resume to jian...@cn.ibm.com Job Description: Big Data processing is becoming more and more hot in industry and IBM invest significantly in this new area to gain the leadership position in the marketplace. You will join CDL Infosphere Big Data(BigInsights) team, an energetic and innovative team that are working with SVL to architect, design, and develop the next generation enterprise product in the Big Data area. This new initiative includes Hadoop-powered distributed parallel data processing system, big data analytics, and management capability for business and IT, supporting structured, semi-structured and unstructured data designed for enterprise class analytics and performance. We are looking for technical leaders, developers and QAs(including professional hire, campus hire and internal transfer) to bring their unique expertise to build and expand this key initiative. A strong candidate must be able to independently design, code, and test major features, as well as work jointly with other team members to deliver complex product component, mentor and lead in the design and implementation of large scale modules and systems. Job Responsibility · Design and implement a scalable and reliable distributed data processing and management infrastructure that spans multiple technologies, including Hadoop, data warehouse, analytics, storage management, indexing, and extreme-volume data movement and management and optimizing hardware and software configurations. · Design and implement system modules to support componentized and high performance parallel applications, including communications infrastructure, metadata services, administrative and user interfaces, and client APIs. · Enhance IBM Hadoop components and integration with IBM products and other popular products · Working with engineers, architects, managers, and quality assurance to design and implement innovative solutions incorporating functionality, performance, scalability, reliability, and adherence to agile development goals and principles. · Work with customers to propose solutions and help customers implement them. · More responsibilities depending on emerging customer requirement and your capabilities Required Skill: - Excellent communication skills including presentation, verbal and written skills on both English and Chinese - 5 years and more of designing and implementing large scalable systems (for technical leaders) - 3 years and more of leading the architecture, design and development of enterprise software (for technicall leaders) - Strong Java development and object oriented programming skills including familiarity with J2EE/Applet/Servlet/JSP/Java/JSON/Python/REST/AJAX - Understanding of distributed systems, map-reduce algorithms, Hadoop, object-oriented programming, and performance optimization techniques. Hadoop/Hbase development/running experience is a big plus. - Database server development experience is a plus - Web application development experience is a plus - Data warehouse and analytics experience is a plus -NoSQL experience is a plus -Ability to work with customers, understand customer business requirements and communicate them to development organization Qualifications: Bachelor or above Degree in Computer Science or relevant areas
RE: hadoop on cygwin : tasktrakker is throwing error : need helpv
I got rid of this error by installing 0.20.2 version From: Santosh Borse [santosh_bo...@persistent.co.in] Sent: Friday, March 23, 2012 7:52 AM To: common-user@hadoop.apache.org Subject: hadoop on cygwin : tasktrakker is throwing error : need helpv I have installed hadoop on cygwin to help me to write MR code in windows eclipse. 2012-03-22 22:19:57,896 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-uygwin\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:726) at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1457) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3716) 2012-03-22 22:19:57,897 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: Config details - OS : Win 7 Hadoop : hadoop-1.0.1 Please let me know if you can help. -Santosh DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails. DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
hadoop on cygwin : tasktrakker is throwing error : need helpv
I have installed hadoop on cygwin to help me to write MR code in windows eclipse. 2012-03-22 22:19:57,896 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-uygwin\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:682) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:655) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:509) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:344) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:189) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:726) at org.apache.hadoop.mapred.TaskTracker.(TaskTracker.java:1457) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3716) 2012-03-22 22:19:57,897 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: Config details - OS : Win 7 Hadoop : hadoop-1.0.1 Please let me know if you can help. -Santosh DISCLAIMER == This e-mail may contain privileged and confidential information which is the property of Persistent Systems Ltd. It is intended only for the use of the individual or entity to which it is addressed. If you are not the intended recipient, you are not authorized to read, retain, copy, print, distribute or use this message. If you have received this communication in error, please notify the sender and delete all copies of this message. Persistent Systems Ltd. does not accept any liability for virus infected mails.
Re: number of partitions
This shouldn't be the case at all. Can you share your Partitioner code and the job.xml of the job that showed this behavior? In any case: How do you "set the numberOfReducer to 4"? 2012/3/23 Harun Raşit ER : > I wrote a custom partitioner. But when I work as standalone or > pseudo-distributed mode, the number of partitions is always 1. I set the > numberOfReducer to 4, but the numOfPartitions parameter of custom > partitioner is still 1 and all my four mappers' results are going to 1 > reducer. The other reducers yield empty files. > > How can i set the number of partitions in standalone or pseudo-distributed > mode? > > thanks for your helps. -- Harsh J
number of partitions
I wrote a custom partitioner. But when I work as standalone or pseudo-distributed mode, the number of partitions is always 1. I set the numberOfReducer to 4, but the numOfPartitions parameter of custom partitioner is still 1 and all my four mappers' results are going to 1 reducer. The other reducers yield empty files. How can i set the number of partitions in standalone or pseudo-distributed mode? thanks for your helps.
Re: Number of retries
Hi Mohit To add on, duplicates won't be there if your output is written to a hdfs file. Because if one attempt of a task is completed only that output file is copied to the final output destn and the files generated by other task attempts that are killed are just ignored. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: "Bejoy KS" Date: Thu, 22 Mar 2012 19:55:55 To: Reply-To: bejoy.had...@gmail.com Subject: Re: Number of retries Mohit If you are writing to a db from a job in an atomic way, this would pop up. You can avoid this only by disabling speculative execution. Drilling down from web UI to a task level would get you the tasks where multiple attempts were there. --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Number of retries Sent: Mar 23, 2012 01:21 I am seeing wierd problem where I am seeing duplicate rows in the database. I am wondering if this is because of some internal retries that might be causing this. Is there a way to look at which tasks were retried? I am not sure what else might cause because when I look at the output data I don't see any duplicates in the file. Regards Bejoy KS Sent from handheld, please excuse typos.
Re: Number of retries
Mohit If you are writing to a db from a job in an atomic way, this would pop up. You can avoid this only by disabling speculative execution. Drilling down from web UI to a task level would get you the tasks where multiple attempts were there. --Original Message-- From: Mohit Anchlia To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Number of retries Sent: Mar 23, 2012 01:21 I am seeing wierd problem where I am seeing duplicate rows in the database. I am wondering if this is because of some internal retries that might be causing this. Is there a way to look at which tasks were retried? I am not sure what else might cause because when I look at the output data I don't see any duplicates in the file. Regards Bejoy KS Sent from handheld, please excuse typos.
Re: tasktracker/jobtracker.. expectation..
Hi Patai JobTracker automatically handles this situation by attempting the task on different nodes.Could you verify the number of attempts that these failed tasks made. Was that just one? If more whether all the task attempts were triggered on the same node or not? Did all of them fail with the same error? You can get this information from the jobtracker web UI, drill down to task level and then further down a failed task. Regards Bejoy On Thu, Mar 22, 2012 at 11:25 PM, Patai Sangbutsarakum < silvianhad...@gmail.com> wrote: > Hi all, > > I have a job fail this morning because of 2 tasks were trying to write > into disk that somehow turned read-only. > Originally, i was thinking/dreaming that in this case somehow those 2 > tasks will be exported automatically > to other dn/tt that also has the required data block, and won't fail. > > I strongly believe that Hadoop can do that but i just didn't know it > well enough to enable it. > > /dev/sdj1 /hadoop10 ext3 ro,noatime,data=ordered 0 0 > > Error initializing attempt_201203211854_2633_m_17_0: EROFS: > Read-only file system at > org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method) at > > org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:496) > at > org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:319) > at > org.apache.hadoop.mapred.JobLocalizer.createLocalDirs(JobLocalizer.java:144) > at > org.apache.hadoop.mapred.DefaultTaskController.initializeJob(DefaultTaskController.java:190) > at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1199) > at java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at > org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1174) > at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1089) > at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2257) > at > org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2221) > > Hope this make sense. > Patai >
Re: rack awareness and safemode
Roger that On Thu, Mar 22, 2012 at 10:40 AM, John Meagher wrote: > Make sure you run "hadoop fsck /". It should report a lot of blocks > with the replication policy violated. In the sort term it isn't > anything to worry about and everything will work fine even with those > errors. Run the script I sent out earlier to fix those errors and > bring everything into compliance with the new rack awareness setup. > > > On Thu, Mar 22, 2012 at 13:36, Patai Sangbutsarakum > wrote: >> I restarted the cluster yesterday with rack-awareness enable. >> Things went well. confirm that there was no issues at all. >> >> Thanks you all again. >> >> >> On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum >> wrote: >>> Thanks you all. >>> >>> >>> On Tue, Mar 20, 2012 at 2:44 PM, Harsh J wrote: John has already addressed your concern. I'd only like to add that fixing of replication violations does not require your NN to be in safe mode and it won't be. Your worry can hence be voided :) On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum wrote: > Thanks for your reply and script. Hopefully it still apply to 0.20.203 > As far as I play with test cluster. The balancer would take care of > replica placement. > I just don't want to fall into the situation that the hdfs sit in the > safemode > for hours and users can't use hadoop and start yelping. > > Let's hear from others. > > > Thanks > Patai > > > On 3/20/12 1:27 PM, "John Meagher" wrote: > >>ere's the script I used (all sorts of caveats about it assuming a >>replication factor of 3 and no real error handling, etc)... >> >>for f in `hadoop fsck / | grep "Replica placement policy is violated" >>| head -n8 | awk -F: '{print $1}'`; do >> hadoop fs -setrep -w 4 $f >> hadoop fs -setrep 3 $f >>done >> >> > -- Harsh J
Re: rack awareness and safemode
Make sure you run "hadoop fsck /". It should report a lot of blocks with the replication policy violated. In the sort term it isn't anything to worry about and everything will work fine even with those errors. Run the script I sent out earlier to fix those errors and bring everything into compliance with the new rack awareness setup. On Thu, Mar 22, 2012 at 13:36, Patai Sangbutsarakum wrote: > I restarted the cluster yesterday with rack-awareness enable. > Things went well. confirm that there was no issues at all. > > Thanks you all again. > > > On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum > wrote: >> Thanks you all. >> >> >> On Tue, Mar 20, 2012 at 2:44 PM, Harsh J wrote: >>> John has already addressed your concern. I'd only like to add that >>> fixing of replication violations does not require your NN to be in >>> safe mode and it won't be. Your worry can hence be voided :) >>> >>> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum >>> wrote: Thanks for your reply and script. Hopefully it still apply to 0.20.203 As far as I play with test cluster. The balancer would take care of replica placement. I just don't want to fall into the situation that the hdfs sit in the safemode for hours and users can't use hadoop and start yelping. Let's hear from others. Thanks Patai On 3/20/12 1:27 PM, "John Meagher" wrote: >ere's the script I used (all sorts of caveats about it assuming a >replication factor of 3 and no real error handling, etc)... > >for f in `hadoop fsck / | grep "Replica placement policy is violated" >| head -n8 | awk -F: '{print $1}'`; do > hadoop fs -setrep -w 4 $f > hadoop fs -setrep 3 $f >done > > >>> >>> >>> >>> -- >>> Harsh J
Re: rack awareness and safemode
I restarted the cluster yesterday with rack-awareness enable. Things went well. confirm that there was no issues at all. Thanks you all again. On Tue, Mar 20, 2012 at 4:19 PM, Patai Sangbutsarakum wrote: > Thanks you all. > > > On Tue, Mar 20, 2012 at 2:44 PM, Harsh J wrote: >> John has already addressed your concern. I'd only like to add that >> fixing of replication violations does not require your NN to be in >> safe mode and it won't be. Your worry can hence be voided :) >> >> On Wed, Mar 21, 2012 at 2:08 AM, Patai Sangbutsarakum >> wrote: >>> Thanks for your reply and script. Hopefully it still apply to 0.20.203 >>> As far as I play with test cluster. The balancer would take care of >>> replica placement. >>> I just don't want to fall into the situation that the hdfs sit in the >>> safemode >>> for hours and users can't use hadoop and start yelping. >>> >>> Let's hear from others. >>> >>> >>> Thanks >>> Patai >>> >>> >>> On 3/20/12 1:27 PM, "John Meagher" wrote: >>> ere's the script I used (all sorts of caveats about it assuming a replication factor of 3 and no real error handling, etc)... for f in `hadoop fsck / | grep "Replica placement policy is violated" | head -n8 | awk -F: '{print $1}'`; do hadoop fs -setrep -w 4 $f hadoop fs -setrep 3 $f done >>> >> >> >> >> -- >> Harsh J
Re: setNumTasks
If you want to control the number of input splits at fine granularity, you could customize the NLineInputFormat. You need to determine the number of lines per each split. Thus you need to know before is the number of lines in your input data, for instance, using hadoop -text /input/dir/* | wc -l will give you a number, lets assume it is N If you have K number of nodes, each nodes has C number of core, basically you could start K*C number of mapper jobs. And you want to further assume each mapper process 2 splits (in case that some jobs are finished earlier), therefore the optimal number of lines in NLineInputFormat is around N/(2*K*C) Thus might give you an optimal job balance. Remember, the NLineInputFormat usually takes longer time than other input format to initialize, and the line split only concerns about number of lines, but is unaware about the content length per each line. Thus, in sequence data analysis is some lines are significantly longer than other lines, the mapper assigned with longer lines will be much slower than those assigned with smaller lines. So randomly mixing short and long lines before split is more preferable. Shi On 3/22/2012 10:01 AM, Bejoy Ks wrote: Hi Mohit The number of map tasks is determined by your number of input splits and the Input Format used by your MR job. Setting this value won't help you control the same. AFAIK it would get effective if the value in mapred.map.tasks is greater than the no of tasks calculated by the Job based on the splits and Input Format. Regards Bejoy KS On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchliawrote: Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's confusing as to what it's purpose is for? I tried setting it for my job still I see more map tasks running than *mapred.map.tasks* On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: There isn't such an API as "setNumTasks". There is however, "setNumReduceTasks", which sets "mapred.reduce.tasks". Does this answer your question? On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia wrote: Could someone please help me answer this question? On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia What is the corresponding system property for setNumTasks? Can it be used explicitly as system property like "mapred.tasks."? -- Harsh J
Re: hadoop permission guideline
Hi Michael, Am moving your question to the scm-us...@cloudera.org group which is home to the community of Cloudera Manager users. You will get better responses here. In case you wish to browse or subscribe to this group, visit https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users (BCC'd common-user@) On Thu, Mar 22, 2012 at 8:21 PM, Michael Wang wrote: > I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to > install all needed packages. When it was installed, the root is used. I > found the installation created some users, such as hdfs, hive, > mapred,hue,hbase... > After the installation, should we change some permission or ownership of > some directories/files? For example, to use HIVE. It works fine with root > user, since the metatore directory belongs to root. But in order to let > other user use HIVE, I have to change metastore ownership to a specific > non-root user, then it works. Is it the best practice? > Another example is the start-all.sh, stop-all.sh they all belong to > root. Should I change them to other user? I guess there are more cases... > > Thanks, > > > > This electronic message, including any attachments, may contain > proprietary, confidential or privileged information for the sole use of the > intended recipient(s). You are hereby notified that any unauthorized > disclosure, copying, distribution, or use of this message is prohibited. If > you have received this message in error, please immediately notify the > sender by reply e-mail and delete it. -- Harsh J
Re: setNumTasks
Hi Mohit The number of map tasks is determined by your number of input splits and the Input Format used by your MR job. Setting this value won't help you control the same. AFAIK it would get effective if the value in mapred.map.tasks is greater than the no of tasks calculated by the Job based on the splits and Input Format. Regards Bejoy KS On Thu, Mar 22, 2012 at 8:28 PM, Mohit Anchlia wrote: > Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's > confusing as to what it's purpose is for? I tried setting it for my job > still I see more map tasks running than *mapred.map.tasks* > > On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: > > > There isn't such an API as "setNumTasks". There is however, > > "setNumReduceTasks", which sets "mapred.reduce.tasks". > > > > Does this answer your question? > > > > On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia > > wrote: > > > Could someone please help me answer this question? > > > > > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia > >wrote: > > > > > >> What is the corresponding system property for setNumTasks? Can it be > > used > > >> explicitly as system property like "mapred.tasks."? > > > > > > > > -- > > Harsh J > > >
Re: setNumTasks
Sorry I meant *setNumMapTasks. *What is mapred.map.tasks for? It's confusing as to what it's purpose is for? I tried setting it for my job still I see more map tasks running than *mapred.map.tasks* On Thu, Mar 22, 2012 at 7:53 AM, Harsh J wrote: > There isn't such an API as "setNumTasks". There is however, > "setNumReduceTasks", which sets "mapred.reduce.tasks". > > Does this answer your question? > > On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia > wrote: > > Could someone please help me answer this question? > > > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia >wrote: > > > >> What is the corresponding system property for setNumTasks? Can it be > used > >> explicitly as system property like "mapred.tasks."? > > > > -- > Harsh J >
Re: hadoop permission guideline
Can you please take this discussion CDH mailing list? On Mar 22, 2012, at 7:51 AM, Michael Wang wrote: > I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to > install all needed packages. When it was installed, the root is used. I > found the installation created some users, such as hdfs, hive, > mapred,hue,hbase... > After the installation, should we change some permission or ownership of some > directories/files? For example, to use HIVE. It works fine with root user, > since the metatore directory belongs to root. But in order to let other user > use HIVE, I have to change metastore ownership to a specific non-root user, > then it works. Is it the best practice? > Another example is the start-all.sh, stop-all.sh they all belong to root. > Should I change them to other user? I guess there are more cases... > > Thanks, > > > > This electronic message, including any attachments, may contain proprietary, > confidential or privileged information for the sole use of the intended > recipient(s). You are hereby notified that any unauthorized disclosure, > copying, distribution, or use of this message is prohibited. If you have > received this message in error, please immediately notify the sender by reply > e-mail and delete it.
Re: setNumTasks
There isn't such an API as "setNumTasks". There is however, "setNumReduceTasks", which sets "mapred.reduce.tasks". Does this answer your question? On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia wrote: > Could someone please help me answer this question? > > On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia wrote: > >> What is the corresponding system property for setNumTasks? Can it be used >> explicitly as system property like "mapred.tasks."? -- Harsh J
hadoop permission guideline
I have installed Cloudera hadoop (CDH). I used its Cloudera Manager to install all needed packages. When it was installed, the root is used. I found the installation created some users, such as hdfs, hive, mapred,hue,hbase... After the installation, should we change some permission or ownership of some directories/files? For example, to use HIVE. It works fine with root user, since the metatore directory belongs to root. But in order to let other user use HIVE, I have to change metastore ownership to a specific non-root user, then it works. Is it the best practice? Another example is the start-all.sh, stop-all.sh they all belong to root. Should I change them to other user? I guess there are more cases... Thanks, This electronic message, including any attachments, may contain proprietary, confidential or privileged information for the sole use of the intended recipient(s). You are hereby notified that any unauthorized disclosure, copying, distribution, or use of this message is prohibited. If you have received this message in error, please immediately notify the sender by reply e-mail and delete it.
Re: setNumTasks
Could someone please help me answer this question? On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia wrote: > What is the corresponding system property for setNumTasks? Can it be used > explicitly as system property like "mapred.tasks."?
Re: Snappy Error
Looks like org.apache.hadoop.io.compress.SnappyCodec is not in the classpath? On Thu, Mar 22, 2012 at 4:30 AM, hadoop hive wrote: > HI Folks, > > i follow all ther steps and build and install snappy and after creating > sequencetable when i m insert overwrite the data into this table its > throwing this error. > > > java.io.IOException: Cannot create an instance of InputFormat class > org.apache.hadoop.mapred.TextInputFormat as specified in mapredWork! >at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:197) >at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:236) >at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:338) >at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) >at org.apache.hadoop.mapred.Child.main(Child.java:170) > Caused by: java.lang.RuntimeException: Error in configuring object >at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getInputFormatFromCache(HiveInputFormat.java:193) >... 4 more > Caused by: java.lang.reflect.InvocationTargetException >at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >at java.lang.reflect.Method.invoke(Method.java:597) >at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) >... 7 more > Caused by: java.lang.IllegalArgumentException: Compression codec > org.apache.hadoop.io.compress.SnappyCodec not found. >at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96) >at > org.apache.hadoop.io.compress.CompressionCodecFactory.(CompressionCodecFactory.java:134) >at > org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:41) >... 12 more > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.io.compress.SnappyCodec >at java.net.URLClassLoader$1.run(URLClassLoader.java:200) >at java.security.AccessController.doPrivileged(Native Method) >at java.net.URLClassLoader.findClass(URLClassLoader.java:188) >at java.lang.ClassLoader.loadClass(ClassLoader.java:307) >at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) >at java.lang.ClassLoader.loadClass(ClassLoader.java:252) >at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) >at java.lang.Class.forName0(Native Method) >at java.lang.Class.forName(Class.java:247) >at > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762) >at > org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89) >... 14 more >