Database insertion by HAdoop
Dear All, We are going to do our experiment of a scientific papers, ] We must insert data in our database for later consideration, it almost 300 tables each one has 2/000/000 records. as you know It takes lots of time to do it with a single machine, we are going to use our Hadoop cluster (32 machines) and divide 300 insertion tasks between them, I need some hint to progress faster, 1- as i know we dont need to Reduser, just Mapper in enough. 2- so wee need just implement Mapper class with needed code. Please let me know if there is any point, Best Regards Masoud
Increasing number of Reducers
Hi all, we have a cluster with 32 machines and running C# version of wordcount program on it. Map phase is done by different machines but Reduce is only done by one machine. Our data is around 7G text data and by using one machine for Reduce phase this job is doing so slowly. Is there any way to increase number of reducers? Thanks Masoud
Re: Increasing number of Reducers
Thanks for reply, as you know in this way we will have n final result too, is this any way to increase the number of Reducer for fast computation but have only one final result? B.S Masoud On 03/20/2012 07:02 PM, bejoy.had...@gmail.com wrote: Hi Mausoud Set -D mapred.reduce.tasks=n; ie to any higher value. Sent from BlackBerry® on Airtel -Original Message- From: Masoudmas...@agape.hanyang.ac.kr Date: Tue, 20 Mar 2012 17:52:58 To:common-user@hadoop.apache.org Reply-To: common-user@hadoop.apache.org Subject: Increasing number of Reducers Hi all, we have a cluster with 32 machines and running C# version of wordcount program on it. Map phase is done by different machines but Reduce is only done by one machine. Our data is around 7G text data and by using one machine for Reduce phase this job is doing so slowly. Is there any way to increase number of reducers? Thanks Masoud
laves could not connect on 9000 and 9001 ports of master
Hi all, we have this problem: org.apache.hadoop.ipc.Client: Retrying connect to server: master/*.*.*.*:9000. Already tried 0 time(s). and for 9001 is same problem too, we opened these port on master firewall. We use NAT to setup our Linux network. Let me know your ideas, Thanks, Masoud
Re: laves could not connect on 9000 and 9001 ports of master
Dear Harsh, Master can do password less ssh to all slaves, form slaves we can connect to the master for example by ping or on port 80 by http. but via hadoop, slaves can not connect to master in port 9001, 9000, we opened these port on server too. Thanks, Masoud On 03/16/2012 06:04 PM, Harsh J wrote: Does a netstat lookup also show that your master is listening on the right interface, and not loopback (localhost)? On Fri, Mar 16, 2012 at 2:29 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi all, we have this problem: org.apache.hadoop.ipc.Client: Retrying connect to server: master/*.*.*.*:9000. Already tried 0 time(s). and for 9001 is same problem too, we opened these port on master firewall. We use NAT to setup our Linux network. Let me know your ideas, Thanks, Masoud
slaves could not connect on 9000 and 9001 ports of master
Hi all, we made a pilot cluster in 3 machines and testing some accepts of hadoop. now trying to setup hadoop on 32 nodes, the problem is below: org.apache.hadoop.ipc.Client: Retrying connect to server: master/*.*.*.*:9000. Already tried 0 time(s). and even for 9001, we opened these port on master. We use NAT to setup our Linux network. Let me know your ideas, Thanks, Masoud
Re: setting up a large hadoop cluster
This is not about using Puppet to setup hadoop cluster, just about single node and cluster (2 node) setup in normal way. Thanks, Masoud On 03/12/2012 02:59 PM, tousif wrote: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr wrote: Dear Patai, Thanks for your reply. we need only to install hadoop no Hbase or other tools, Could you please introduce some useful sites or docs to use puppet for setting up hadoop cluster? Thanks. Masoud. On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote: We did 2pb clusters by puppet. What did you find unclear? P On Mar 9, 2012, at 21:32, Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr wrote: Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S
Re: setting up a large hadoop cluster
Patai, as you know unfortunately Puppet is not open source and free version only support 10 nodes, I think we have to setup our stack manually.. haha Thanks On 03/13/2012 02:43 AM, Patai Sangbutsarakum wrote: Masoud, this are where I started off. https://github.com/seanhead/puppet_module_hadoop And hadoop puppet module publish by adobe. Hth p On 3/11/12 11:26 PM, Masoudmas...@agape.hanyang.ac.kr wrote: This is not about using Puppet to setup hadoop cluster, just about single node and cluster (2 node) setup in normal way. Thanks, Masoud On 03/12/2012 02:59 PM, tousif wrote: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-sing le-node-cluster/ On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr wrote: Dear Patai, Thanks for your reply. we need only to install hadoop no Hbase or other tools, Could you please introduce some useful sites or docs to use puppet for setting up hadoop cluster? Thanks. Masoud. On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote: We did 2pb clusters by puppet. What did you find unclear? P On Mar 9, 2012, at 21:32, Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr wrote: Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S
Re: setting up a large hadoop cluster
Dear Joey, Really thankful for your great help, I hope could find docs there too, Best Regards, Masoud On 03/13/2012 10:35 AM, Joey Echeverria wrote: Masoud, I know that the Puppet Labs website is confusing, but puppet is open source and has no node limit. You can download it from here: http://puppetlabs.com/misc/download-options/ If you're using a Red Hat compatible linux distribution, you can get RPMs from EPEL: http://projects.puppetlabs.com/projects/puppet/wiki/Downloading_Puppet#RPM+Packages If you prefer source, you can get it from github: https://github.com/puppetlabs/puppet If you're curious about the license, it's Apache 2.0: https://github.com/puppetlabs/puppet/blob/master/LICENSE -Joey On Mon, Mar 12, 2012 at 8:12 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Patai, as you know unfortunately Puppet is not open source and free version only support 10 nodes, I think we have to setup our stack manually.. haha Thanks On 03/13/2012 02:43 AM, Patai Sangbutsarakum wrote: Masoud, this are where I started off. https://github.com/seanhead/puppet_module_hadoop And hadoop puppet module publish by adobe. Hth p On 3/11/12 11:26 PM, Masoudmas...@agape.hanyang.ac.krwrote: This is not about using Puppet to setup hadoop cluster, just about single node and cluster (2 node) setup in normal way. Thanks, Masoud On 03/12/2012 02:59 PM, tousif wrote: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-sing le-node-cluster/ On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr wrote: Dear Patai, Thanks for your reply. we need only to install hadoop no Hbase or other tools, Could you please introduce some useful sites or docs to use puppet for setting up hadoop cluster? Thanks. Masoud. On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote: We did 2pb clusters by puppet. What did you find unclear? P On Mar 9, 2012, at 21:32, Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr wrote: Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S
Re: setting up a large hadoop cluster
Dear Patai, Thanks for your reply. we need only to install hadoop no Hbase or other tools, Could you please introduce some useful sites or docs to use puppet for setting up hadoop cluster? Thanks. Masoud. On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote: We did 2pb clusters by puppet. What did you find unclear? P On Mar 9, 2012, at 21:32, Masoudmas...@agape.hanyang.ac.kr wrote: Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S
setting up a large hadoop cluster
Hi all, As we know setting up hadoop cluster contains doing different settings in all machines, so time consuming and non effective. anybody knows about setting up a hadoop cluster easily? some ways such as puppet does not have enough docs or clear road map. Thanks, B.S
Best way for setting up a large cluster
Hi all, I installed hadoop in a pilot cluster with 3 machines and now going to make our actual cluster with 32 nodes. as you know setting up hadoop separately in every nodes is time consuming and not perfect way. whats the best way or tool to setup hadoop cluster (expect cloudera)? Thanks, B.S
hadoop 1.0 / HOD or CloneZilla?
Hi all, I have experience with hadoop 0.20.204 on 3 machines cluster as pilot, now im trying to setup real cluster on 32 linux machines. I have some question: 1. is hadoop 1.0 stable?? in hadoop site this version is indicated as beta release 2. as you know installing and setting up hadoop in all 32 machines separately in not good idea, so what can i do? 1. using hadoop on demand (HOD)? 2. or using OS image replicate tools same as clozeZilla? i think this method is better because in addition to hadoop I can clone same other settings such as SSH or Samba in all machines. Let me know your idea, B.S, Masoud.
Re: Killing hadoop jobs automatically
Dear Praveenesh I think there are only two ways to kill a job: 1- kill command, (not perfect way cause you should know the job id) 2- mapred.task.timeout (in bin/hadoop jar command using {-Dmapred.task.timeout=} set your desired value in msec) sometimes for me its happened too, not in all machines in some special machines jobs executed slowly than others i think cause of hardware problems. As i know Shuffling is done by hadoop and we can only contribute in it by setting output format class.Be aware its normal that some jobs finished later than others so dont be so sensitive on it since hadoop manage all things, overall result is our goal in hadoop based computation, I hope it could be helpful. Good Luck, Masoud, On 01/30/2012 06:07 PM, praveenesh kumar wrote: @ Harsh - Yeah, mapred.task.timeout is the valid option. but for some reasons, its not happening the way it should be.. I am not sure what could be the cause.Thing is my jobs are running fine, its just that they are slow at shuffling phase, sometimes.. not everytime.. so I was thinking as an admin - can we control the running of jobs, just as a test, where we can just kill the jobs who are taking more time for execution -- not only those jobs that are hanging..but jobs that are taking more execution time than expected. Problems in my case is, end-user doesn't want to go through the pain of managing/controlling jobs over hadoop. They want all these job handling should happen automatically, so that made me to think in such a way (which I know is not the best way) Anyways, going away from the topic -- Is there anyway through which I can improve my shuffling (through any configuration parameters only, knowing the fact that users doesn't know the idea of minimizing the key/value pairs) Thanks, Praveenesh On Mon, Jan 30, 2012 at 1:06 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi, Every Map/Reduce app has a Reporter, You can set the configuration parameter {mapred.task.timeout} of Reporter to your desired value. Good Luck. On 01/30/2012 04:14 PM, praveenesh kumar wrote: Yeah, I am aware of that, but it needs you to explicity monitor the job and look for jobid and then hadoop job -kill command. What I want to know - Is there anyway to do all this automatically by providing some timer or something -- that if my job is taking more than some predefined time, it would get killed automatically Thanks, Praveenesh On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi prash1...@gmail.comwrote: You might want to take a look at the kill command : hadoop job -kill jobid. Prashant On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumarpraveen...@gmail.com wrote: Is there anyway through which we can kill hadoop jobs that are taking enough time to execute ? What I want to achieve is - If some job is running more than _some_predefined_timeout_**limit, it should be killed automatically. Is it possible to achieve this, through shell scripts or any other way ? Thanks, Praveenesh
Re: Best Linux Operating system used for Hadoop
Hi, I suggest you Fedora, in my opinion its more powerful than other distribution. i have run hadoop on it without any problem, good luck On 01/27/2012 06:15 PM, Sujit Dhamale wrote: Hi All, I am new to Hadoop, Can any one tell me which is the best Linux Operating system used for installing running Hadoop. ?? now a day i am using Ubuntu 11.4 and install Hadoop on it but it crashes number of times . can some please help me out ??? Kind regards Sujit Dhamale
map/reduce by C# Hadoop
Dear All, any one did it before: map/reduce by C# Hadoop ??? As you know for developing map/reduce app in hadoop we should extend and implement special map and reduce abstract classes and interfaces, and Hadoop pipes is for C++ not C#. The question is what we should do for C#? *IS IT RIGHT THAT * 1- Just develop our C# code (maybe its better with MonoDevelop) according to map/reduce abstract logic by developing map and reduce classes 2- Then introduce the map and reduce classes to the hadoop streaming. Does hadoop streaming work with .ddl file? Thanks for your help.
Re: map/reduce by C# Hadoop
Thank for your reply^^ but question is how we can write map reduce in C# without using Java abstract classes and interfaces of Hadoop, should we only write C# code according to map/reduce app logic? what do you mean of app can run standalone? you mean its should be .exe or .dll? did you do map/reduce by C# Hadoop before? Oh, lot of question. sorry ... B.S On 12/27/2011 06:13 PM, Harsh J wrote: I haven't used Mono but if your written program can run as a standalone and read from stdin and write to stdout, then streaming is sufficient to run your C# MR programs. On 27-Dec-2011, at 2:31 PM, Masoud wrote: Dear All, any one did it before: map/reduce by C# Hadoop ??? As you know for developing map/reduce app in hadoop we should extend and implement special map and reduce abstract classes and interfaces, and Hadoop pipes is for C++ not C#. The question is what we should do for C#? *IS IT RIGHT THAT * 1- Just develop our C# code (maybe its better with MonoDevelop) according to map/reduce abstract logic by developing map and reduce classes 2- Then introduce the map and reduce classes to the hadoop streaming. Does hadoop streaming work with .ddl file? Thanks for your help.
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Dear Uma, as you know when we use start-all.sh command, all the outputs saved in log files, when i check the tasktracker log file, i see the below error message and its shutdown. im really confused, its more than 4 days im working in this issue and tried different ways but no result.^^ BS. Masoud On 11/03/2011 08:34 PM, Uma Maheswara Rao G 72686 wrote: it wont disply any thing on console. If you get any error while exceuting the command, then only it will disply on console. In your case it might executed successfully. Still you are facing same problem with TT startup? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Thursday, November 3, 2011 7:02 am Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi, thanks for info, i checked that report, seems same with mine but no specific solution mentioned. Yes, i changed this folder permission via cygwin,NO RESULT. Im really confused. ... any idea please ...? Thanks, B.S On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: Hadoop + cygwin
Dear Joey, as you know when installing cygwin in new version of windows- for ssh to localhost- a new user created by cygwin (cyg_server). my main user is Administrator. I run namenode -fomat and dfs created in /tmp/Administrator-hadoop/ ~~~ .But tasktracker started by cyg_server and makes again /tmp/cyg_server-hadoop/~~~ I tried these : * I installed cygwin again and by some way i changed the cyg_server to Administrator.now tasktracker is ran by Administrator to, but same error. * i changed hadoop.tmp.dir to the inside of cygwin dir, again same error. *i think* the problem is related to Java, it makes HFDS based on windows path and permission. when i checked the hdfs permission under cygwin is - means nothing and then by running tasktracker it wana change the permission according to the Linux permission so it can't. i dont know what is the solution, . im playing with this issue around 4 days but no result. BS. Masoud On 11/03/2011 08:19 PM, Joey Echeverria wrote: What are the permissions on \tmp\hadoop-cyg_server\mapred\local\ttprivate? Which user owns that directory? Which user are you starting you TaskTracker as? -Joey On Wed, Nov 2, 2011 at 9:29 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi, Im running hadop 0.20.204 under cygwin 1.7 on Win7, java 1.6.22 i got this error: 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: *Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 * I tried different ways, i even setted {hadoop.tmp.dir} to the cygwin home dir too, got same result, the problem is that JAVA using windows path and file permission to create HDFS, and i think under cygwin by simulating Linux behaviour Hadoop can not change the windows permission to Linux permission for mentioned folder in error message, Maybe its a bug and should fixed in source code., Any idea please. Thanks, Masoud. On 11/01/2011 08:12 PM, Rita wrote: Why ? The beauty of hadoop is its OS agnostic. What is your native operating system? I am sure you have a version of JDK and JRE running there. On Tue, Nov 1, 2011 at 4:53 AM, Masoudmas...@agape.hanyang.ac.krwrote: Hi Anybody ran hadoop on cygwin for development purpose??? Did you have any problem in running tasktracker? Thanks
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Hi, what do you mean of cygwin is my path? I add c:/cygwin/bin and c:/cygwin/usr/sbin to windows path... you had this problem too? let me know how you fixed it Thanks, B.R On 11/02/2011 01:29 AM, Shevek wrote: Smells like failure to execute chmod to me; make sure cygwin is on your path? On 1 November 2011 01:38, Uma Maheswara Rao G 72686mahesw...@huawei.comwrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: Hadoop + cygwin
Hi, Im running hadop 0.20.204 under cygwin 1.7 on Win7, java 1.6.22 i got this error: 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: *Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 * I tried different ways, i even setted {hadoop.tmp.dir} to the cygwin home dir too, got same result, the problem is that JAVA using windows path and file permission to create HDFS, and i think under cygwin by simulating Linux behaviour Hadoop can not change the windows permission to Linux permission for mentioned folder in error message, Maybe its a bug and should fixed in source code., Any idea please. Thanks, Masoud. On 11/01/2011 08:12 PM, Rita wrote: Why ? The beauty of hadoop is its OS agnostic. What is your native operating system? I am sure you have a version of JDK and JRE running there. On Tue, Nov 1, 2011 at 4:53 AM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi Anybody ran hadoop on cygwin for development purpose??? Did you have any problem in running tasktracker? Thanks
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Hi, thanks for info, i checked that report, seems same with mine but no specific solution mentioned. Yes, i changed this folder permission via cygwin,NO RESULT. Im really confused. ... any idea please ...? Thanks, B.S On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote: Looks, that is permissions related issue on local dirs There is an issue filed in mapred, related to this problem https://issues.apache.org/jira/browse/MAPREDUCE-2921 Can you please provide permissions explicitely and try? Regards, Uma - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 1:19 pm Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Sure, ^^ when I run {namenode -fromat} it makes dfs in c:/tmp/ administrator_hadoop/ after that by running start-all.sh every thing is OK, all daemons run except tasktracker. My current user in administrator, but tacktracker runs by cyg_server user that made by cygwin in installation time;This is a part of log file: 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as cyg_server 2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /tmp/hadoop-cyg_server/mapred/local 2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183) at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741) at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463) at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611) 2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: SHUTDOWN_MSG: / Thanks, BR. On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote: Can you please give some trace? - Original Message - From: Masoudmas...@agape.hanyang.ac.kr Date: Tuesday, November 1, 2011 11:08 am Subject: under cygwin JUST tasktracker run by cyg_server user, Permission denied . To: common-user@hadoop.apache.org Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Hadoop + cygwin
Hi Anybody ran hadoop on cygwin for development purpose??? Did you have any problem in running tasktracker? Thanks
under cygwin JUST tasktracker run by cyg_server user, Permission denied .....
Hi I have problem in running hadoop under cygwin 1.7 only tasktracker ran by cyg_server user and so make some problems, so any idea please??? BS. Masoud.
Re: Hadoop + Cygwin , IOException, /TMP dir
Dear Harsh I know it, but when i can set it? i couldn't find the place. thanks On 10/28/2011 04:53 PM, Harsh J wrote: Masoud, You can change your temp-files location by overriding hadoop.tmp.dir with your desired, proper path. Hopefully, that should help you. On Fri, Oct 28, 2011 at 12:04 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi, I installed cygwin on win7, when i run hadoop examples its makes /tmp dir in C:/ (win install dir) not in c:/cygwin (cygwin install dir), so java IOException happened. any solution? Thanks, BS
Re: Hadoop + Cygwin , IOException, /TMP dir
I did it before , but not working, ill try again .. cygwin got crazy, really. i have another problem to add JAVA_HOME to cygwin. Thanks On 10/28/2011 05:09 PM, Harsh J wrote: Masoud, You can set hadoop.tmp.dir in core-site.xml inside your $HADOOP_HOME/conf directory. On Fri, Oct 28, 2011 at 1:25 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Dear Harsh I know it, but when i can set it? i couldn't find the place. thanks On 10/28/2011 04:53 PM, Harsh J wrote: Masoud, You can change your temp-files location by overriding hadoop.tmp.dir with your desired, proper path. Hopefully, that should help you. On Fri, Oct 28, 2011 at 12:04 PM, Masoudmas...@agape.hanyang.ac.kr wrote: Hi, I installed cygwin on win7, when i run hadoop examples its makes /tmp dir in C:/ (win install dir) not in c:/cygwin (cygwin install dir), so java IOException happened. any solution? Thanks, BS
Hadoop tasktracker shutdown in CYGWIN
Hi When i run hadoop under cygwin,except tasktracker all daemons starts well. this is the log file error message: ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Failed to set permissions of path: \home\Administrator\software\hadoop-tmp\mapred\local\ttprivate to 0700 really i got crazy these days cause of cygwin ^^ Thanks, B.S
Hadoop 0.20.204 eclipse plugin
Hi, I'm trying to setup a hadoop development node on windows. as you know an eclipse plugin released by Hadoop 0.20.204, Do you know this plugin is match with which version of eclipse? is it working on 3.7? Thanks, BS.
implicit addressing in hadoop config files
Dear friends, I have copied my hadoop conf file from {hadoop_inst_dir} to other place. When i define this new place by explicit address -e.g, /home/masoud/software/hadoop-conf- in {hadoop_inst_dir}/bin/hadoop every thing is OK, but when I define it by implicit addressing-e.g, ../../hadoop-conf- hadoop can not find that. PS: new hadoop conf dir is placed exactly two upper folder of {hadoop_inst_dir}/bin/ Best Regards, Masoud.
Re: implicit addressing in hadoop config files
On 10/19/2011 06:19 PM, Masoud wrote: Dear friends, I have copied my hadoop conf file from {hadoop_inst_dir} to other place. When i define this new place by explicit address -e.g, /home/masoud/software/hadoop-conf- in {hadoop_inst_dir}/bin/hadoop every thing is OK, but when I de I found the solution, a little bit shell coding in hadoop file, ^^
Re: difference between development and production platform???
Dear Steve, thanks for your useful comments, I completely agree with your idea, personally its more than 10 years that im only using Fedora, java, java related techs, and open source software in all of my projects, but this is a critical situation, all of current data and apps in our univ's lab deployed on Microsoft platform. we can transfer our data from windows to Linux, but all of the codes are written in C#, we can connect C# code to hadoop and run them on Linux too but personally i cant grantee the result. *SO AS A SUMMARY*: 1- we can only use Linux machines for production platform, 2- and only using windows as *development platform* in pseudo-distributed mode. AM I RIGHT in 1 and 2? please correct or verify them. Thanks, BS. Masoud, 2011/9/28 Steve Loughran ste...@apache.org On 28/09/11 04:19, Hamedani, Masoud wrote: Special Thanks for your help Arko, You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use Hadoop and make connection between our existing systems and programs with it. as you mentioned we should move all of our data to Linux systems, and execute existing C# codes in Linux and only use windows for development same as before. Am I right? What is really meant is nobody runs hadoop at scale on Windows. Specifically -there's an expectation that there is a unix API you can exec -some of the operations (e.g. how programs are exec()'d) are optimised for linux -everyone tests on 50+ node clusters on Linux. Why Linux? Stable, low cost. And you can install it on your laptop/desktop and develop there too. Because everyone uses Linux (or possibly a genuine Unix system like Solaris), problems encountered in real systems get found on Linux and fixed. If you want to run a production Hadoop cluster on Windows, you are free to do so. Just be aware that you may be the first person to do so at scale, so you get to find problems first, you get to file the bugs -and because you are the only person with these problems and the ability to replicate them- you get to fix them. Nobody is going to say oh, this patch is for Windows only use, we will reject it -at least provided it doesn't have adverse effects on Linux/Unix. It's just that nobody else publicly runs Hadoop on Windows. A key step 1 will be cross compiling all the native code to Windows, which on 0.23+ also means protocol buffers. Enjoy. Where you will find problems is that even on Win64, Hadoop can't directly load or run C# APPs or anything else written to compile against their managed runtime (I forget it's name). You will have to bridge via streaming, and take a performance hit. You could also try running the C# code under Mono on Linux; it may or may not work. Again, you get to find out and fix the problems -this time with the Mono project. -Steve
Re: difference between development and production platform???
Special Thanks for your help Arko, You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use Hadoop and make connection between our existing systems and programs with it. as you mentioned we should move all of our data to Linux systems, and execute existing C# codes in Linux and only use windows for development same as before. Am I right? Thanks, B.S Masoud. 2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com Hi, A development platform is the system (s) which are used mainly for the developers to write / unit test code for the project. There are generally NO end users in the Development system. Production platform is where the end users actually work and the project is generally moved here only after it is tested in one / more test platforms. Typically, if the developer is the end user, which it is in some cases, (even more likely for University projects) there's generally no need to make your project run on separate production or test system(s). The documentation means that you can use Hadoop in WIn32 for developing your code, but finally if you use that code and then run production boxes on Win32 (i.e end users are using a Win32 Hadoop system), then that is not supported. Correct me guys if I am wrong. Thanks regards Arko On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud mas...@agape.hanyang.ac.kr wrote: Dear Friends, Im new in hadoop for an important data mining university research, i saw these sentences in different hadoop related docs: { Win32 is supported as a *development platform* not as a *production platform*, but Linux supported both. } whats difference between *development platform and * *production platform ??? *it means dataNode and nameNode?? Thanks, B.S
Re: difference between development and production platform???
Thanks for your nice help Arko, maybe because im new in hadoop i cant get some of points, im studying hadoop manual more deeply to have better info. B.S Masoud. 2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com Hi, You necessarily don't need to execute the C# codes on Linux. You can write a middleware application to bring the data from the Win boxes to the Linux (Hadoop) boxes if you want to. Cheers Arko On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud mas...@agape.hanyang.ac.kr wrote: Special Thanks for your help Arko, You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use Hadoop and make connection between our existing systems and programs with it. as you mentioned we should move all of our data to Linux systems, and execute existing C# codes in Linux and only use windows for development same as before. Am I right? Thanks, B.S Masoud. 2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com Hi, A development platform is the system (s) which are used mainly for the developers to write / unit test code for the project. There are generally NO end users in the Development system. Production platform is where the end users actually work and the project is generally moved here only after it is tested in one / more test platforms. Typically, if the developer is the end user, which it is in some cases, (even more likely for University projects) there's generally no need to make your project run on separate production or test system(s). The documentation means that you can use Hadoop in WIn32 for developing your code, but finally if you use that code and then run production boxes on Win32 (i.e end users are using a Win32 Hadoop system), then that is not supported. Correct me guys if I am wrong. Thanks regards Arko On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud mas...@agape.hanyang.ac.kr wrote: Dear Friends, Im new in hadoop for an important data mining university research, i saw these sentences in different hadoop related docs: { Win32 is supported as a *development platform* not as a *production platform*, but Linux supported both. } whats difference between *development platform and * *production platform ??? *it means dataNode and nameNode?? Thanks, B.S