Database insertion by HAdoop

2013-02-18 Thread Masoud

Dear All,

We are going to do our experiment of a scientific papers, ]
We must insert data in our database for later consideration, it almost 
300 tables each one has 2/000/000 records.

as you know It takes lots of time to do it with a single machine,
we are going to use our Hadoop cluster (32 machines) and divide 300 
insertion tasks between them,

I need some hint to progress faster,
1- as i know we dont need to Reduser, just Mapper in enough.
2- so wee need just implement Mapper class with needed code.

Please let me know if there is any point,

Best Regards
Masoud



Increasing number of Reducers

2012-03-20 Thread Masoud

Hi all,

we have a cluster with 32 machines and running C# version of wordcount 
program on it.
Map phase is done by different machines but Reduce is only done by one 
machine. Our data is around 7G text data and by using one machine for 
Reduce phase this job is doing so slowly.

Is there any way to increase number of reducers?

Thanks
Masoud


Re: Increasing number of Reducers

2012-03-20 Thread Masoud

Thanks for reply,

as you know in this way we will have n final result too,
is this any way to increase the number of Reducer for fast computation 
but have only one final result?


B.S
Masoud

On 03/20/2012 07:02 PM, bejoy.had...@gmail.com wrote:

Hi Mausoud
Set -D mapred.reduce.tasks=n; ie  to any higher value.
Sent from BlackBerry® on Airtel

-Original Message-
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tue, 20 Mar 2012 17:52:58
To:common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Increasing number of Reducers

Hi all,

we have a cluster with 32 machines and running C# version of wordcount
program on it.
Map phase is done by different machines but Reduce is only done by one
machine. Our data is around 7G text data and by using one machine for
Reduce phase this job is doing so slowly.
Is there any way to increase number of reducers?

Thanks
Masoud




laves could not connect on 9000 and 9001 ports of master

2012-03-16 Thread Masoud

Hi all,

we have this problem:

org.apache.hadoop.ipc.Client: Retrying connect to server: 
master/*.*.*.*:9000. Already tried 0 time(s).


and for 9001 is same problem too, we opened these port on master firewall.
We use NAT to setup our Linux network.

Let me know your ideas,

Thanks,
Masoud


Re: laves could not connect on 9000 and 9001 ports of master

2012-03-16 Thread Masoud

Dear Harsh,

Master can do password less ssh to all slaves,
form slaves we can connect to the master for example by ping or on port 
80 by http.

but via hadoop, slaves can not connect to master in port 9001, 9000,
we opened these port on server too.

Thanks,
Masoud

On 03/16/2012 06:04 PM, Harsh J wrote:

Does a netstat lookup also show that your master is listening on the
right interface, and not loopback (localhost)?

On Fri, Mar 16, 2012 at 2:29 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:

Hi all,

we have this problem:

org.apache.hadoop.ipc.Client: Retrying connect to server:
master/*.*.*.*:9000. Already tried 0 time(s).

and for 9001 is same problem too, we opened these port on master firewall.
We use NAT to setup our Linux network.

Let me know your ideas,

Thanks,
Masoud







slaves could not connect on 9000 and 9001 ports of master

2012-03-15 Thread Masoud

Hi all,

we made a pilot cluster in 3 machines and testing some accepts of hadoop.
now trying to setup hadoop on 32 nodes, the problem is below:

org.apache.hadoop.ipc.Client: Retrying connect to server: 
master/*.*.*.*:9000. Already tried 0 time(s).


and even for 9001, we opened these port on master.
We use NAT to setup our Linux network.

Let me know your ideas,

Thanks,
Masoud


Re: setting up a large hadoop cluster

2012-03-12 Thread Masoud


This is not about using Puppet to setup hadoop cluster, just about 
single node and cluster (2 node) setup in normal way.


Thanks,
Masoud


On 03/12/2012 02:59 PM, tousif wrote:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/



On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr  wrote:


Dear Patai,

Thanks for your reply.
we need only to install hadoop no Hbase or other tools,
Could you please introduce some useful sites or docs to use puppet for
setting up hadoop cluster?

Thanks.
Masoud.


On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote:


We did 2pb clusters by puppet.
What did you find unclear?

P

On Mar 9, 2012, at 21:32, 
Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr
  wrote:

  Hi all,

As we know setting up hadoop cluster contains doing different settings
in all machines, so time consuming and non effective.
anybody knows about setting up a hadoop cluster easily?
some ways such as puppet does not have enough docs or clear road map.

Thanks,
B.S








Re: setting up a large hadoop cluster

2012-03-12 Thread Masoud

Patai,

as you know unfortunately Puppet is not open source and free version 
only support 10 nodes,

I think we have to setup our stack manually.. haha

Thanks

On 03/13/2012 02:43 AM, Patai Sangbutsarakum wrote:

Masoud,
this are where I started off.
https://github.com/seanhead/puppet_module_hadoop
And hadoop puppet module publish by adobe.

Hth
p

On 3/11/12 11:26 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:


This is not about using Puppet to setup hadoop cluster, just about
single node and cluster (2 node) setup in normal way.

Thanks,
Masoud


On 03/12/2012 02:59 PM, tousif wrote:

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-sing
le-node-cluster/



On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr
wrote:


Dear Patai,

Thanks for your reply.
we need only to install hadoop no Hbase or other tools,
Could you please introduce some useful sites or docs to use puppet for
setting up hadoop cluster?

Thanks.
Masoud.


On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote:


We did 2pb clusters by puppet.
What did you find unclear?

P

On Mar 9, 2012, at 21:32,
Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr
   wrote:

   Hi all,

As we know setting up hadoop cluster contains doing different
settings
in all machines, so time consuming and non effective.
anybody knows about setting up a hadoop cluster easily?
some ways such as puppet does not have enough docs or clear road map.

Thanks,
B.S






Re: setting up a large hadoop cluster

2012-03-12 Thread Masoud

Dear Joey,

Really thankful for your great help,
I hope could find docs there too,

Best Regards,
Masoud

On 03/13/2012 10:35 AM, Joey Echeverria wrote:

Masoud,

I know that the Puppet Labs website is confusing, but puppet is open
source and has no node limit. You can download it from here:

http://puppetlabs.com/misc/download-options/

If you're using a Red Hat compatible linux distribution, you can get
RPMs from EPEL:

http://projects.puppetlabs.com/projects/puppet/wiki/Downloading_Puppet#RPM+Packages

If you prefer source, you can get it from github:

https://github.com/puppetlabs/puppet

If you're curious about the license, it's Apache 2.0:

https://github.com/puppetlabs/puppet/blob/master/LICENSE

-Joey

On Mon, Mar 12, 2012 at 8:12 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:

Patai,

as you know unfortunately Puppet is not open source and free version only
support 10 nodes,
I think we have to setup our stack manually.. haha

Thanks


On 03/13/2012 02:43 AM, Patai Sangbutsarakum wrote:

Masoud,
this are where I started off.
https://github.com/seanhead/puppet_module_hadoop
And hadoop puppet module publish by adobe.

Hth
p

On 3/11/12 11:26 PM, Masoudmas...@agape.hanyang.ac.krwrote:


This is not about using Puppet to setup hadoop cluster, just about
single node and cluster (2 node) setup in normal way.

Thanks,
Masoud


On 03/12/2012 02:59 PM, tousif wrote:


http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-sing
le-node-cluster/



On Mon, Mar 12, 2012 at 11:21 AM, Masoudmas...@agape.hanyang.ac.kr
wrote:


Dear Patai,

Thanks for your reply.
we need only to install hadoop no Hbase or other tools,
Could you please introduce some useful sites or docs to use puppet for
setting up hadoop cluster?

Thanks.
Masoud.


On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote:


We did 2pb clusters by puppet.
What did you find unclear?

P

On Mar 9, 2012, at 21:32,
Masoudmasoud@agape.hanyang.**ac.krmas...@agape.hanyang.ac.kr
   wrote:

   Hi all,

As we know setting up hadoop cluster contains doing different
settings
in all machines, so time consuming and non effective.
anybody knows about setting up a hadoop cluster easily?
some ways such as puppet does not have enough docs or clear road map.

Thanks,
B.S









Re: setting up a large hadoop cluster

2012-03-11 Thread Masoud

Dear Patai,

Thanks for your reply.
we need only to install hadoop no Hbase or other tools,
Could you please introduce some useful sites or docs to use puppet for 
setting up hadoop cluster?


Thanks.
Masoud.

On 03/10/2012 04:30 PM, Patai Sangbutsarakum wrote:

We did 2pb clusters by puppet.
What did you find unclear?

P

On Mar 9, 2012, at 21:32, Masoudmas...@agape.hanyang.ac.kr  wrote:


Hi all,

As we know setting up hadoop cluster contains doing different settings in all 
machines, so time consuming and non effective.
anybody knows about setting up a hadoop cluster easily?
some ways such as puppet does not have enough docs or clear road map.

Thanks,
B.S





setting up a large hadoop cluster

2012-03-09 Thread Masoud

Hi all,

As we know setting up hadoop cluster contains doing different settings 
in all machines, so time consuming and non effective.

anybody knows about setting up a hadoop cluster easily?
some ways such as puppet does not have enough docs or clear road map.

Thanks,
B.S



Best way for setting up a large cluster

2012-03-08 Thread Masoud

Hi all,

I installed hadoop in a pilot cluster with 3 machines and now going to 
make our actual cluster with 32 nodes.
as you know setting up hadoop separately in every nodes is time 
consuming and not perfect way.

whats the best way or tool to setup hadoop cluster (expect cloudera)?

Thanks,
B.S


hadoop 1.0 / HOD or CloneZilla?

2012-03-05 Thread Masoud

Hi all,

I have experience with hadoop 0.20.204 on 3 machines cluster as pilot, 
now im trying to setup real cluster on 32 linux machines.

I have some question:

1. is hadoop 1.0 stable?? in hadoop site this version is indicated as
   beta release

2. as you know installing and setting up hadoop in all 32 machines
   separately in not good idea, so what can i do?
1. using hadoop on demand (HOD)?
2. or using OS image replicate tools same as clozeZilla? i think
   this method is better because in addition to hadoop I can clone
   same other settings such as SSH or Samba in all machines.

Let me know your idea,

B.S,
Masoud.



Re: Killing hadoop jobs automatically

2012-01-30 Thread Masoud

Dear Praveenesh

I think there are only two ways to kill a job:
1- kill command, (not perfect way cause you should know the job id)
2- mapred.task.timeout (in bin/hadoop jar command using 
{-Dmapred.task.timeout=} set your desired value in msec)

sometimes for me its happened too, not in all machines in some special machines 
jobs executed slowly than others i think cause of hardware problems.
As i know Shuffling is done by hadoop and we can only contribute in it by 
setting output format class.Be aware its normal that some jobs finished later 
than others so dont be so sensitive on it since hadoop manage all things, 
overall result is our goal in hadoop based computation,

I hope it could be helpful.

Good Luck,
Masoud,


On 01/30/2012 06:07 PM, praveenesh kumar wrote:

@ Harsh -

Yeah, mapred.task.timeout is the valid option. but for some reasons, its
not happening the way it should be.. I am not sure what could be the
cause.Thing is my jobs are running fine, its just that they are slow at
shuffling phase, sometimes.. not everytime.. so I was thinking as an admin
- can we control the running of jobs, just as a  test, where we can just
kill the jobs who are taking more time for execution -- not only those jobs
that are hanging..but jobs that are taking more execution time than
expected. Problems in my case is, end-user doesn't want to go through the
pain of managing/controlling jobs over hadoop. They want all these job
handling should happen automatically, so that made me to think in such a
way (which I know is not the best way)

Anyways, going away from the topic -- Is there anyway through which I can
improve my shuffling (through any configuration parameters only, knowing
the fact that users doesn't know the idea of minimizing the key/value
pairs)

Thanks,
Praveenesh

On Mon, Jan 30, 2012 at 1:06 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:


Hi,

Every Map/Reduce app has a Reporter, You can set the configuration
parameter {mapred.task.timeout} of  Reporter to your desired value.

Good Luck.


On 01/30/2012 04:14 PM, praveenesh kumar wrote:


Yeah, I am aware of that, but it needs you to explicity monitor the job
and
look for jobid and then hadoop job -kill command.
What I want to know - Is there anyway to do all this automatically by
providing some timer or something -- that if my job is taking more than
some predefined time, it would get killed automatically

Thanks,
Praveenesh

On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
prash1...@gmail.comwrote:

  You might want to take a look at the kill command : hadoop job -kill

jobid.

Prashant

On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumarpraveen...@gmail.com


wrote:
Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?

What I want to achieve is - If some job is running more than
_some_predefined_timeout_**limit, it should be killed automatically.

Is it possible to achieve this, through shell scripts or any other way ?

Thanks,
Praveenesh






Re: Best Linux Operating system used for Hadoop

2012-01-27 Thread Masoud

Hi,

I suggest you Fedora, in my opinion its more powerful than other 
distribution.

i have run hadoop on it without any problem,

good luck

On 01/27/2012 06:15 PM, Sujit Dhamale wrote:

Hi All,
I am new to Hadoop,
Can any one tell me which is the best Linux Operating system used for
installing  running Hadoop. ??
now a day i am using Ubuntu 11.4 and install Hadoop on it but it
crashes number of times .

can some please help me out ???


Kind regards
Sujit Dhamale





map/reduce by C# Hadoop

2011-12-27 Thread Masoud

Dear All,
any one did it before: map/reduce by C#  Hadoop ???

As you know for developing map/reduce app in hadoop we should extend and 
implement special map and reduce abstract classes and interfaces,
and Hadoop pipes is for C++ not C#. The question is what we should do 
for C#?


*IS IT RIGHT THAT *
1- Just develop our C# code (maybe its better with MonoDevelop) 
according to map/reduce abstract logic by developing map and reduce classes

2- Then introduce the map and reduce classes to the hadoop streaming.

Does hadoop streaming work with .ddl file?

Thanks for your help.


Re: map/reduce by C# Hadoop

2011-12-27 Thread Masoud

Thank for your reply^^
but question is how we can write map reduce in C# without using Java 
abstract classes and interfaces of Hadoop, should we only

write C# code according to map/reduce app logic?
what do you mean of app can run standalone? you mean its should be .exe 
or .dll?

did you do  map/reduce by C#  Hadoop before?
Oh, lot of question. sorry ...

B.S


On 12/27/2011 06:13 PM, Harsh J wrote:

I haven't used Mono but if your written program can run as a standalone and 
read from stdin and write to stdout, then streaming is sufficient to run your 
C# MR programs.

On 27-Dec-2011, at 2:31 PM, Masoud wrote:


Dear All,
any one did it before: map/reduce by C#  Hadoop ???

As you know for developing map/reduce app in hadoop we should extend and 
implement special map and reduce abstract classes and interfaces,
and Hadoop pipes is for C++ not C#. The question is what we should do for C#?

*IS IT RIGHT THAT *
1- Just develop our C# code (maybe its better with MonoDevelop) according to 
map/reduce abstract logic by developing map and reduce classes
2- Then introduce the map and reduce classes to the hadoop streaming.

Does hadoop streaming work with .ddl file?

Thanks for your help.




Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....

2011-11-03 Thread Masoud

Dear Uma,
as you know when we use start-all.sh command, all the outputs saved in 
log files,
when i check the tasktracker log file, i see the below error message and 
its shutdown.
im really confused, its more than 4 days im working in this issue and 
tried different ways but no result.^^


BS.
Masoud

On 11/03/2011 08:34 PM, Uma Maheswara Rao G 72686 wrote:

it wont disply any thing on console.
If you get any error while exceuting the command, then only it will disply on 
console. In your case it might executed successfully.
Still you are facing same problem with TT startup?

Regards,
Uma
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Thursday, November 3, 2011 7:02 am
Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission 
denied .
To: common-user@hadoop.apache.org


Hi,
thanks for info, i checked that report, seems same with mine but
no
specific solution mentioned.
Yes, i changed this folder permission via cygwin,NO RESULT.
Im really confused. ...

any idea please ...?

Thanks,
B.S


On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote:

Looks, that is permissions related issue on local dirs
There is an issue filed in mapred, related to this problem

https://issues.apache.org/jira/browse/MAPREDUCE-2921

Can you please provide permissions explicitely and try?

Regards,
Uma
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 1:19 pm
Subject: Re: under cygwin JUST tasktracker run by cyg_server

user, Permission denied .

To: common-user@hadoop.apache.org


Sure, ^^

when I run {namenode -fromat} it makes dfs in c:/tmp/
administrator_hadoop/
after that by running start-all.sh every thing is OK, all daemons
run
except tasktracker.
My current user in administrator, but tacktracker runs by
cyg_server
user that made by cygwin in installation time;This is a part of log
file:
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tasktracker with owner as cyg_server
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Good
mapred local directories are: /tmp/hadoop-cyg_server/mapred/local
2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker:
Can
not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate
to 0700
  at
org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680)
   at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653)
  at


org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483)
   at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)   
at
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)   
at
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741)   at

org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463)
at

org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611)

2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/


Thanks,
BR.

On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote:

Can you please give some trace?
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 11:08 am
Subject: under cygwin JUST tasktracker run by cyg_server user,

Permission denied .

To: common-user@hadoop.apache.org


Hi
I have problem in running hadoop under cygwin 1.7
only tasktracker ran by cyg_server user and so make some

problems,  so any idea please???

BS.
Masoud.







Re: Hadoop + cygwin

2011-11-03 Thread Masoud

Dear Joey,

as you know when installing cygwin in new version of windows- for ssh to 
localhost- a new user created by cygwin (cyg_server). my main user is 
Administrator.
I run namenode -fomat and dfs created in /tmp/Administrator-hadoop/ ~~~ 
.But tasktracker started by cyg_server and makes again 
/tmp/cyg_server-hadoop/~~~

I tried these :

 * I installed cygwin again and by some way i changed the cyg_server to
   Administrator.now tasktracker is ran by Administrator to, but same
   error.
 * i changed hadoop.tmp.dir to the inside of cygwin dir, again same error.

*i think* the problem is related to Java, it makes HFDS based on windows 
path and permission. when i checked the hdfs permission under cygwin is 
- means nothing 
and then by running tasktracker it wana change the permission according 
to the Linux permission so it can't.
i dont know what is the solution, . im playing with this issue 
around 4 days but no result.


BS.
Masoud


On 11/03/2011 08:19 PM, Joey Echeverria wrote:

What are the permissions on \tmp\hadoop-cyg_server\mapred\local\ttprivate?

Which user owns that directory?

Which user are you starting you TaskTracker as?

-Joey

On Wed, Nov 2, 2011 at 9:29 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:

Hi,

Im running hadop 0.20.204 under cygwin 1.7 on Win7, java 1.6.22
i got this error:

2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker:
*Can not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate
to 0700 *
  I tried different ways, i even setted {hadoop.tmp.dir} to the cygwin home
dir too, got same result,
the problem is that JAVA using windows path and file permission to create
HDFS, and i think
under cygwin by simulating Linux behaviour Hadoop can not change the windows
permission to
Linux permission for mentioned folder in error message,
Maybe its a bug and should fixed in source code.,

Any idea please.

Thanks,
Masoud.

On 11/01/2011 08:12 PM, Rita wrote:

Why ?

The beauty of hadoop is its OS agnostic. What is your native operating
system? I am sure you have a version of JDK and JRE running there.


On Tue, Nov 1, 2011 at 4:53 AM, Masoudmas...@agape.hanyang.ac.krwrote:


Hi

Anybody ran hadoop on cygwin for development purpose???
Did you have any problem in running tasktracker?

Thanks












Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....

2011-11-02 Thread Masoud

Hi,
what do you mean of cygwin is my path?
I add c:/cygwin/bin and c:/cygwin/usr/sbin to windows path...
you had this problem too? let me know how you fixed it

Thanks,
B.R

On 11/02/2011 01:29 AM, Shevek wrote:

Smells like failure to execute chmod to me; make sure cygwin is on your
path?

On 1 November 2011 01:38, Uma Maheswara Rao G 72686mahesw...@huawei.comwrote:


Looks, that is permissions related issue on local dirs
There is an issue filed in mapred, related to this problem
https://issues.apache.org/jira/browse/MAPREDUCE-2921

Can you please provide permissions explicitely and try?

Regards,
Uma
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 1:19 pm
Subject: Re: under cygwin JUST tasktracker run by cyg_server user,
Permission denied .
To: common-user@hadoop.apache.org


Sure, ^^

when I run {namenode -fromat} it makes dfs in c:/tmp/
administrator_hadoop/
after that by running start-all.sh every thing is OK, all daemons
run
except tasktracker.
My current user in administrator, but tacktracker runs by
cyg_server
user that made by cygwin in installation time;This is a part of log
file:
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tasktracker with owner as cyg_server
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Good
mapred local directories are: /tmp/hadoop-cyg_server/mapred/local
2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker:
Can
not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate
to 0700
 at
org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680)
  at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653)
 at


org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483)

 at


org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)

 at
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741)
 at
org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463)
   at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611)

2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/


Thanks,
BR.

On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote:

Can you please give some trace?
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 11:08 am
Subject: under cygwin JUST tasktracker run by cyg_server user,

Permission denied .

To: common-user@hadoop.apache.org


Hi
I have problem in running hadoop under cygwin 1.7
only tasktracker ran by cyg_server user and so make some problems,
so any idea please???

BS.
Masoud.







Re: Hadoop + cygwin

2011-11-02 Thread Masoud

Hi,

Im running hadop 0.20.204 under cygwin 1.7 on Win7, java 1.6.22
i got this error:

2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker:
*Can not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate
to 0700 *
 I tried different ways, i even setted {hadoop.tmp.dir} to the cygwin 
home dir too, got same result,
the problem is that JAVA using windows path and file permission to 
create HDFS, and i think
under cygwin by simulating Linux behaviour Hadoop can not change the 
windows permission to

Linux permission for mentioned folder in error message,
Maybe its a bug and should fixed in source code.,

Any idea please.

Thanks,
Masoud.

On 11/01/2011 08:12 PM, Rita wrote:

Why ?

The beauty of hadoop is its OS agnostic. What is your native operating
system? I am sure you have a version of JDK and JRE running there.


On Tue, Nov 1, 2011 at 4:53 AM, Masoudmas...@agape.hanyang.ac.kr  wrote:


Hi

Anybody ran hadoop on cygwin for development purpose???
Did you have any problem in running tasktracker?

Thanks








Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....

2011-11-02 Thread Masoud

Hi,
thanks for info, i checked that report, seems same with mine but no 
specific solution mentioned.

Yes, i changed this folder permission via cygwin,NO RESULT.
Im really confused. ...

any idea please ...?

Thanks,
B.S


On 11/01/2011 05:38 PM, Uma Maheswara Rao G 72686 wrote:

Looks, that is permissions related issue on local dirs
There is an issue filed in mapred, related to this problem 
https://issues.apache.org/jira/browse/MAPREDUCE-2921

Can you please provide permissions explicitely and try?

Regards,
Uma
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 1:19 pm
Subject: Re: under cygwin JUST tasktracker run by cyg_server user, Permission 
denied .
To: common-user@hadoop.apache.org


Sure, ^^

when I run {namenode -fromat} it makes dfs in c:/tmp/
administrator_hadoop/
after that by running start-all.sh every thing is OK, all daemons
run
except tasktracker.
My current user in administrator, but tacktracker runs by
cyg_server
user that made by cygwin in installation time;This is a part of log
file:
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Starting tasktracker with owner as cyg_server
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker:
Good
mapred local directories are: /tmp/hadoop-cyg_server/mapred/local
2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker:
Can
not start task tracker because java.io.IOException: Failed to set
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate
to 0700
 at
org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680)
  at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653)
 at
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483)
 at
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
 at
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741)
 at
org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463)
   at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611)

2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker:
SHUTDOWN_MSG:
/


Thanks,
BR.

On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote:

Can you please give some trace?
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 11:08 am
Subject: under cygwin JUST tasktracker run by cyg_server user,

Permission denied .

To: common-user@hadoop.apache.org


Hi
I have problem in running hadoop under cygwin 1.7
only tasktracker ran by cyg_server user and so make some problems,
so any idea please???

BS.
Masoud.







Re: under cygwin JUST tasktracker run by cyg_server user, Permission denied .....

2011-11-01 Thread Masoud

Sure, ^^

when I run {namenode -fromat} it makes dfs in c:/tmp/ 
administrator_hadoop/
after that by running start-all.sh every thing is OK, all daemons run 
except tasktracker.
My current user in administrator, but tacktracker runs by cyg_server 
user that made by cygwin in installation time;This is a part of log file:


2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: 
Starting tasktracker with owner as cyg_server
2011-11-01 14:26:54,463 INFO org.apache.hadoop.mapred.TaskTracker: Good 
mapred local directories are: /tmp/hadoop-cyg_server/mapred/local
2011-11-01 14:26:54,479 ERROR org.apache.hadoop.mapred.TaskTracker: Can 
not start task tracker because java.io.IOException: Failed to set 
permissions of path: \tmp\hadoop-cyg_server\mapred\local\ttprivate to 0700

at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:680)
at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:653)
at 
org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:483)
at 
org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:318)
at 
org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
at 
org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:741)

at org.apache.hadoop.mapred.TaskTracker.init(TaskTracker.java:1463)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3611)

2011-11-01 14:26:54,479 INFO org.apache.hadoop.mapred.TaskTracker: 
SHUTDOWN_MSG:

/


Thanks,
BR.

On 11/01/2011 04:33 PM, Uma Maheswara Rao G 72686 wrote:

Can you please give some trace?
- Original Message -
From: Masoudmas...@agape.hanyang.ac.kr
Date: Tuesday, November 1, 2011 11:08 am
Subject: under cygwin JUST tasktracker run by cyg_server user, Permission 
denied .
To: common-user@hadoop.apache.org


Hi
I have problem in running hadoop under cygwin 1.7
only tasktracker ran by cyg_server user and so make some problems,
so any idea please???

BS.
Masoud.





Hadoop + cygwin

2011-11-01 Thread Masoud

Hi

Anybody ran hadoop on cygwin for development purpose???
Did you have any problem in running tasktracker?

Thanks


under cygwin JUST tasktracker run by cyg_server user, Permission denied .....

2011-10-31 Thread Masoud

Hi
I have problem in running hadoop under cygwin 1.7
only tasktracker ran by cyg_server user and so make some problems,
so any idea please???

BS.
Masoud.


Re: Hadoop + Cygwin , IOException, /TMP dir

2011-10-28 Thread Masoud

Dear Harsh
I know it, but when i can set it?
i couldn't find the place.
thanks

On 10/28/2011 04:53 PM, Harsh J wrote:

Masoud,

You can change your temp-files location by overriding hadoop.tmp.dir
with your desired, proper path. Hopefully, that should help you.

On Fri, Oct 28, 2011 at 12:04 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:

Hi,
I installed cygwin on win7, when i run hadoop examples its makes /tmp dir in
C:/ (win install dir) not in c:/cygwin (cygwin install dir), so java
IOException happened.
any solution?

Thanks,
BS








Re: Hadoop + Cygwin , IOException, /TMP dir

2011-10-28 Thread Masoud

I did it before , but not working, ill try again ..
cygwin got crazy, really.
i have another problem to add JAVA_HOME to cygwin.

Thanks

On 10/28/2011 05:09 PM, Harsh J wrote:

Masoud,

You can set hadoop.tmp.dir in core-site.xml inside your
$HADOOP_HOME/conf directory.

On Fri, Oct 28, 2011 at 1:25 PM, Masoudmas...@agape.hanyang.ac.kr  wrote:

Dear Harsh
I know it, but when i can set it?
i couldn't find the place.
thanks

On 10/28/2011 04:53 PM, Harsh J wrote:

Masoud,

You can change your temp-files location by overriding hadoop.tmp.dir
with your desired, proper path. Hopefully, that should help you.

On Fri, Oct 28, 2011 at 12:04 PM, Masoudmas...@agape.hanyang.ac.kr
  wrote:

Hi,
I installed cygwin on win7, when i run hadoop examples its makes /tmp dir
in
C:/ (win install dir) not in c:/cygwin (cygwin install dir), so java
IOException happened.
any solution?

Thanks,
BS












Hadoop tasktracker shutdown in CYGWIN

2011-10-28 Thread Masoud

Hi

When i run hadoop under cygwin,except tasktracker all daemons starts well.
this is the log file error message:

ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker 
because
java.io.IOException: Failed to set permissions of path: 
\home\Administrator\software\hadoop-tmp\mapred\local\ttprivate to 0700


really i got crazy these days cause of cygwin ^^

Thanks,
B.S


Hadoop 0.20.204 eclipse plugin

2011-10-27 Thread Masoud

Hi,

I'm trying to setup a hadoop development node on windows.
as you know an eclipse plugin released by Hadoop 0.20.204,
Do you know this plugin is match with which version of eclipse? is it 
working on 3.7?


Thanks,
BS.


implicit addressing in hadoop config files

2011-10-19 Thread Masoud

Dear friends,
I have copied my hadoop conf file from {hadoop_inst_dir} to other place. 
When i define this new place by explicit address -e.g, 
/home/masoud/software/hadoop-conf- in {hadoop_inst_dir}/bin/hadoop every 
thing is OK, but when I define it by implicit addressing-e.g, 
../../hadoop-conf- hadoop can not find that.


PS: new hadoop conf dir is placed exactly two upper folder of 
{hadoop_inst_dir}/bin/


Best Regards,
Masoud.


Re: implicit addressing in hadoop config files

2011-10-19 Thread Masoud

On 10/19/2011 06:19 PM, Masoud wrote:

Dear friends,
I have copied my hadoop conf file from {hadoop_inst_dir} to other 
place. When i define this new place by explicit address -e.g, 
/home/masoud/software/hadoop-conf- in {hadoop_inst_dir}/bin/hadoop 
every thing is OK, but when I de

I found the solution, a little bit shell coding in hadoop file, ^^


Re: difference between development and production platform???

2011-09-28 Thread Hamedani, Masoud
Dear Steve,

thanks for your useful comments, I completely agree with your idea,
personally its more than 10 years that im only using Fedora, java, java
related techs, and open source software in all of my projects,
but this is a critical situation, all of current data and apps in our univ's
lab deployed on Microsoft platform.
we can transfer our data from windows to Linux, but all of the codes are
written in C#, we can connect C# code to hadoop and run them on Linux too
but personally i cant grantee the result.
*SO AS A SUMMARY*:
1- we can only use Linux machines for production platform,
2- and only using windows as *development platform* in pseudo-distributed
mode.

AM I RIGHT in 1 and 2? please correct or verify them.

Thanks,
BS.
Masoud,

2011/9/28 Steve Loughran ste...@apache.org

 On 28/09/11 04:19, Hamedani, Masoud wrote:

 Special Thanks for your help Arko,

 You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
 the clusters should deployed on Linux machines???
 We have lots of data (on windows OS) and code (written in C#) for data
 mining, we wana to use Hadoop and make connection between
 our existing systems and programs with it.
 as you mentioned we should move all of our data to Linux systems, and
 execute existing C# codes in Linux and only use windows for
 development same as before.
 Am I right?


 What is really meant is nobody runs hadoop at scale on Windows.

 Specifically
  -there's an expectation that there is a unix API you can exec
  -some of the operations (e.g. how programs are exec()'d) are optimised for
 linux
  -everyone tests on 50+ node clusters on Linux.

 Why Linux? Stable, low cost. And you can install it on your laptop/desktop
 and develop there too.


 Because everyone uses Linux (or possibly a genuine Unix system like
 Solaris), problems encountered in real systems get found on Linux and fixed.

 If you want to run a production Hadoop cluster on Windows, you are free to
 do so. Just be aware that you may be the first person to do so at scale, so
 you get to find problems first, you get to file the bugs -and because you
 are the only person with these problems and the ability to replicate them-
 you get to fix them.

 Nobody is going to say oh, this patch is for Windows only use, we will
 reject it -at least provided it doesn't have adverse effects on Linux/Unix.
 It's just that nobody else publicly runs Hadoop on Windows. A key step 1
 will be cross compiling all the native code to Windows, which on 0.23+ also
 means protocol buffers. Enjoy.

 Where you will find problems is that even on Win64, Hadoop can't directly
 load or run C# APPs or anything else written to compile against their
 managed runtime (I forget it's name). You will have to bridge via streaming,
 and take a performance hit.

 You could also try running the C# code under Mono on Linux; it may or may
 not work. Again, you get to find out and fix the problems -this time with
 the Mono project.

 -Steve



Re: difference between development and production platform???

2011-09-27 Thread Hamedani, Masoud
Special Thanks for your help Arko,

You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
the clusters should deployed on Linux machines???
We have lots of data (on windows OS) and code (written in C#) for data
mining, we wana to use Hadoop and make connection between
our existing systems and programs with it.
as you mentioned we should move all of our data to Linux systems, and
execute existing C# codes in Linux and only use windows for
development same as before.
Am I right?

Thanks,
B.S
Masoud.

2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com

 Hi,

 A development platform is the system (s) which are used mainly for the
 developers to write / unit test code for the project.

 There are generally NO end users in the Development system.

 Production platform is where the end users actually work and the
 project is generally moved here only after it is tested in one / more
 test platforms.

 Typically, if the developer is the end user, which it is in some
 cases, (even more likely for University projects) there's generally no
 need to make your project run on separate production or test
 system(s).

 The documentation means that you can use Hadoop in WIn32 for
 developing your code, but finally if you use that code and then run
 production boxes on Win32 (i.e end users are using a Win32 Hadoop
 system), then that is not supported.

 Correct me guys if I am wrong.

 Thanks  regards
 Arko

 On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud
 mas...@agape.hanyang.ac.kr wrote:
  Dear Friends,
 
  Im new in hadoop for an important data mining university research, i saw
  these sentences in different hadoop related docs:
 
  { Win32 is supported as a *development platform* not as a *production
  platform*, but Linux supported both. }
 
  whats difference between *development platform and * *production platform
  ???
  *it means dataNode and nameNode??
 
  Thanks,
  B.S
 



Re: difference between development and production platform???

2011-09-27 Thread Hamedani, Masoud
Thanks for your nice help Arko,
maybe because im new in hadoop i cant get some of points,
im studying hadoop manual more deeply to have better info.

B.S
Masoud.

2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com

 Hi,

 You necessarily don't need to execute the C# codes on Linux.

 You can write a middleware application to bring the data from the Win
 boxes to the Linux (Hadoop) boxes if you want to.

 Cheers
 Arko

 On Tue, Sep 27, 2011 at 10:19 PM, Hamedani, Masoud
 mas...@agape.hanyang.ac.kr wrote:
  Special Thanks for your help Arko,
 
  You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all
  the clusters should deployed on Linux machines???
  We have lots of data (on windows OS) and code (written in C#) for data
  mining, we wana to use Hadoop and make connection between
  our existing systems and programs with it.
  as you mentioned we should move all of our data to Linux systems, and
  execute existing C# codes in Linux and only use windows for
  development same as before.
  Am I right?
 
  Thanks,
  B.S
  Masoud.
 
  2011/9/28 Arko Provo Mukherjee arkoprovomukher...@gmail.com
 
  Hi,
 
  A development platform is the system (s) which are used mainly for the
  developers to write / unit test code for the project.
 
  There are generally NO end users in the Development system.
 
  Production platform is where the end users actually work and the
  project is generally moved here only after it is tested in one / more
  test platforms.
 
  Typically, if the developer is the end user, which it is in some
  cases, (even more likely for University projects) there's generally no
  need to make your project run on separate production or test
  system(s).
 
  The documentation means that you can use Hadoop in WIn32 for
  developing your code, but finally if you use that code and then run
  production boxes on Win32 (i.e end users are using a Win32 Hadoop
  system), then that is not supported.
 
  Correct me guys if I am wrong.
 
  Thanks  regards
  Arko
 
  On Tue, Sep 27, 2011 at 9:32 PM, Hamedani, Masoud
  mas...@agape.hanyang.ac.kr wrote:
   Dear Friends,
  
   Im new in hadoop for an important data mining university research, i
 saw
   these sentences in different hadoop related docs:
  
   { Win32 is supported as a *development platform* not as a *production
   platform*, but Linux supported both. }
  
   whats difference between *development platform and * *production
 platform
   ???
   *it means dataNode and nameNode??
  
   Thanks,
   B.S