Re: no jobtracker to stop,no namenode to stop

2013-08-29 Thread NJain
Hi Nikhil,

Appreciate your quick response on this, but the issue still continues. I
believe I have covered all the pointers you have mentioned. Still I am
pasting the portions of the documents so that you can verify.

1. /etc/hosts file,  localhost should not be commented, and add ip address.
The entry looks like the below:
# localhost name resolution is handled within DNS itself.
127.0.0.1   localhost

2. core-site.xml, hdfs//localhost:port number
configuration
 property
 namefs.default.name/name
 valuehdfs://localhost:9000/value
 /property
/configuration
3. mapred-site.xml hdfs//localhost:port number mapred.local.dir
configuration
 property
 namemapred.job.tracker/name
 valuelocalhost:9001/value
 /property
/configuration

4. hdfs-site.xml 1.replication factor should be one
  include dfs.name.dir property
dfs.data.dir property
for both the property check on net
configuration
 property
 namedfs.replication/name
 value1/value
 /property
 property
 namedfs.name.dir/name
 valuec:/Hadoop/name/value
 /property
 property
 namedfs.data.dir/name
 valuec:/Hadoop/data/value
 /property
/configuration


I am getting stuck at:
13/08/30 11:39:26 WARN mapred.JobClient: No job jar file set.  User classes
may not be found. See JobConf(Class) or JobConf#setJar(String).
13/08/30 11:39:26 INFO input.FileInputFormat: Total input paths to process
: 1
13/08/30 11:39:26 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/08/30 11:39:26 WARN snappy.LoadSnappy: Snappy native library not loaded
13/08/30 11:39:27 INFO mapred.JobClient: Running job: job_201308301135_0002
13/08/30 11:39:28 INFO mapred.JobClient:  map 0% reduce 0%

My Jobtracker UI looks like this:

Cluster Summary (Heap Size is 120.06 MB/888.94 MB)Running Map TasksRunning
Reduce TasksTotal SubmissionsNodesOccupied Map SlotsOccupied Reduce
SlotsReserved
Map SlotsReserved Reduce SlotsMap Task CapacityReduce Task CapacityAvg.
Tasks/NodeBlacklisted NodesGraylisted NodesExcluded
Nodes0010http://localhost:50030/machines.jsp?type=active
00-0 
http://localhost:50030/machines.jsp?type=blacklisted0http://localhost:50030/machines.jsp?type=graylisted
0 http://localhost:50030/machines.jsp?type=excluded



I have a feeling that the jobtracker is not able to find the task tracker
as there is a 0 in nodes column.

Does this ring any bells to you?

Thanks,
Nitesh Jain



On Thu, Aug 29, 2013 at 5:51 PM, Nikhil2405 [via Hadoop Common] 
ml-node+s472056n4024848...@n3.nabble.com wrote:

 Hi Nitesh,

 I think your problem may be in your configuration, so check your files as
 follow

 1. /etc/hosts file,  localhost should not be commented, and add ip
 address.
 2. core-site.xml, hdfs//localhost:port number
 3. mapred-site.xml hdfs//localhost:port number mapred.local.dir
 4. hdfs-site.xml 1.replication factor should be one
   include dfs.name.dir property
 dfs.data.dir property
 for both the property check on net

 Thanks

 Nikhil

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://hadoop-common.472056.n3.nabble.com/no-jobtracker-to-stop-no-namenode-to-stop-tp34874p4024848.html
  To unsubscribe from no jobtracker to stop,no namenode to stop, click 
 herehttp://hadoop-common.472056.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=34874code=bml0ZXNoLmphaW44NUBnbWFpbC5jb218MzQ4NzR8MzUzNjEyNzQx
 .
 NAMLhttp://hadoop-common.472056.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://hadoop-common.472056.n3.nabble.com/no-jobtracker-to-stop-no-namenode-to-stop-tp34874p4024979.html
Sent from the Users mailing list archive at Nabble.com.

Re: Sqoop issue related to Hadoop

2013-08-29 Thread bejoy . hadoop
Hi Raj

The easiest approach to pull out task log is using JT web UI.

Got to JT web UI, drill down on the sqoop job. You'll get a list of 
failed/killed tasks, your failed thask would be in there. Clicking on that task 
would give you the logs for the same. 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-Original Message-
From: Hadoop Raj hadoop...@yahoo.com
Date: Thu, 29 Aug 2013 00:43:59 
To: user@hadoop.apache.org
Reply-To: user@hadoop.apache.org
Subject: Re: Sqoop issue related to Hadoop

Hi Kate,

Where can I find the task attempt log? Can you specify the location please?


Thanks,
Raj

On Aug 28, 2013, at 7:13 PM, Kathleen Ting kathl...@apache.org wrote:

 Raj, in addition to what Abe said, please also send the failed task attempt 
 log
 attempt_201307041900_0463_m_00_0 as well.
 
 Thanks,
 Kate
 
 On Wed, Aug 28, 2013 at 2:25 PM, Abraham Elmahrek a...@cloudera.com wrote:
 Hey Raj,
 
 It seems like the number of fields you have in your data doesn't match the
 number of fields in your RAJ.CUSTOMERS table.
 
 Could you please add --verbose to the beginning of your argument list and
 provide the entire contents here?
 
 -Abe
 
 
 On Wed, Aug 28, 2013 at 9:36 AM, Raj Hadoop hadoop...@yahoo.com wrote:
 
 Hello all,
 
 I am getting an error while using sqoop export ( Load HDFS file to Oracle
 ). I am not sure the issue might be a Sqoop or Hadoop related one. So I am
 sending it to both the dist lists.
 
 I am using -
 
 sqoop export --connect jdbc:oracle:thin:@//dbserv:9876/OKI --table
 RAJ.CUSTOMERS --export-dir /user/hive/warehouse/web_cust --input-null-string
 '\\N' --input-null-non-string '\\N'  --username  --password  -m 1
 --input-fields-terminated-by '\t'
 I am getting the following error -
 
 Warning: /usr/lib/hbase does not exist! HBase imports will fail.
 Please set $HBASE_HOME to the root of your HBase installation.
 Warning: $HADOOP_HOME is deprecated.
 13/08/28 09:42:36 WARN tool.BaseSqoopTool: Setting your password on the
 command-line is insecure. Consider using -P instead.
 13/08/28 09:42:36 INFO manager.SqlManager: Using default fetchSize of 1000
 13/08/28 09:42:36 INFO tool.CodeGenTool: Beginning code generation
 13/08/28 09:42:38 INFO manager.OracleManager: Time zone has been set to
 GMT
 13/08/28 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT
 t.* FROM RAJ.CUSTOMERS t WHERE 1=0
 13/08/28 09:42:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
 /software/hadoop/hadoop/hadoop-1.1.2
 Note:
 /tmp/sqoop-hadoop/compile/c1376f66d2151b48024c54305377c981/RAJ_CUSTOMERS.java
 uses or overrides a deprecated API.
 Note: Recompile with -Xlint:deprecation for details.
 13/08/28 09:42:40 INFO orm.CompilationManager: Writing jar file:
 /tmp/sqoop-hadoop/compile/c1376f66d2151b48024c54305377c981/RAJ.CUSTOMERS.jar
 13/08/28 09:42:40 INFO mapreduce.ExportJobBase: Beginning export of
 RAJ.CUSTOMERS
 13/08/28 09:42:41 INFO manager.OracleManager: Time zone has been set to
 GMT
 13/08/28 09:42:43 INFO input.FileInputFormat: Total input paths to process
 : 1
 13/08/28 09:42:43 INFO input.FileInputFormat: Total input paths to process
 : 1
 13/08/28 09:42:43 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 13/08/28 09:42:43 WARN snappy.LoadSnappy: Snappy native library not loaded
 13/08/28 09:42:43 INFO mapred.JobClient: Running job:
 job_201307041900_0463
 13/08/28 09:42:44 INFO mapred.JobClient:  map 0% reduce 0%
 13/08/28 09:42:56 INFO mapred.JobClient:  map 1% reduce 0%
 13/08/28 09:43:00 INFO mapred.JobClient:  map 2% reduce 0%
 13/08/28 09:43:03 INFO mapred.JobClient:  map 4% reduce 0%
 13/08/28 09:43:10 INFO mapred.JobClient:  map 5% reduce 0%
 13/08/28 09:43:13 INFO mapred.JobClient:  map 6% reduce 0%
 13/08/28 09:43:17 INFO mapred.JobClient: Task Id :
 attempt_201307041900_0463_m_00_0, Status : FAILED
 java.io.IOException: Can't export data, please check task tracker logs
at
 org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at
 org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at
 org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:794)
at RAJ_CUSTOMERS.__loadFromFields(RAJ_CUSTOMERS.java:1057)
at RAJ_CUSTOMERS.parse(RAJ_CUSTOMERS.java:876)
at
 

reading input stream

2013-08-29 Thread jamal sasha
Hi,
  Probably a very stupid question.
I have this data in binary format... and the following piece of code works
for me in normal java.


public classparser {

 public static void main(String [] args) throws Exception{
 String filename = sample.txt;
File file = new File(filename);
FileInputStream fis = new FileInputStream(filename);
 System.out.println(Total file size to read (in bytes) : 
+ fis.available());
BSONDecoder bson = new BSONDecoder();
System.out.println(bson.readObject(fis));
}
}


Now finally the last line is the answer..
Now, I want to implement this on hadoop but the challenge (which I think)
is.. that I am not reading or parsing data line by line.. rather its a
stream of data??? right??
How do i replicate the above code logic.. but in hadoop?


how to find process under node

2013-08-29 Thread suneel hadoop
Hi All,

what im trying out here is to capture the process which is running under
which node
this is the unix script which i tried
!/bin/ksh

Cnt=cat /users/hadoop/unixtest/nodefilename.txt | wc -l
cd /users/hadoop/unixtest/
ls -ltr | awk '{print $9}'  list_of_scripts.txt
split -l $Cnt list_of_scripts.txt node_scripts
ls -ltr node_scripts* | awk '{print $9}'  list_of_node_scripts.txt
for i in nodefilename.txt
do
for j in list_of_node_scripts.txt
do
node=$i
script_file=$j
cat $node\n $script_file  $script_file
done
done

exit 0;

but my result should look like below:

node1 node2
- ---
process1 proces3
process2 proces4

can some one please help in this..
thanks in advance..


Re: how to find process under node

2013-08-29 Thread Shekhar Sharma
Are your trying to find the java process under a node...Then simple
thing would be to do ssh and run jps command to get the list of java
process
Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Aug 29, 2013 at 12:27 PM, suneel hadoop
suneel.bigd...@gmail.com wrote:
 Hi All,

 what im trying out here is to capture the process which is running under
 which node

 this is the unix script which i tried


 !/bin/ksh


 Cnt=cat /users/hadoop/unixtest/nodefilename.txt | wc -l
 cd /users/hadoop/unixtest/
 ls -ltr | awk '{print $9}'  list_of_scripts.txt
 split -l $Cnt list_of_scripts.txt node_scripts
 ls -ltr node_scripts* | awk '{print $9}'  list_of_node_scripts.txt
 for i in nodefilename.txt
 do
 for j in list_of_node_scripts.txt
 do
 node=$i
 script_file=$j
 cat $node\n $script_file  $script_file
 done
 done


 exit 0;



 but my result should look like below:


 node1 node2
 - ---
 process1 proces3
 process2 proces4


 can some one please help in this..
 thanks in advance..


Re: reading input stream

2013-08-29 Thread Shekhar Sharma
Path p = new Path(path of the file which youwould like to read from HDFS);
FSDataInputStream iStream = FileSystem.open(p);
String str;
while((str = iStream.readLine())!=null)
{
System.out.printn(str);

}
Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Aug 29, 2013 at 12:15 PM, jamal sasha jamalsha...@gmail.com wrote:
 Hi,
   Probably a very stupid question.
 I have this data in binary format... and the following piece of code works
 for me in normal java.


 public classparser {

 public static void main(String [] args) throws Exception{
 String filename = sample.txt;
 File file = new File(filename);
 FileInputStream fis = new FileInputStream(filename);
 System.out.println(Total file size to read (in bytes) : 
 + fis.available());
 BSONDecoder bson = new BSONDecoder();
 System.out.println(bson.readObject(fis));
 }
 }


 Now finally the last line is the answer..
 Now, I want to implement this on hadoop but the challenge (which I think)
 is.. that I am not reading or parsing data line by line.. rather its a
 stream of data??? right??
 How do i replicate the above code logic.. but in hadoop?


Re: Hadoop client user

2013-08-29 Thread Shekhar Sharma
Put that user in hadoop group...
And if the user wants to hadoop client, then the use should be aware
of two properties fs.default.name which is the address of NameNode and
mapred.job.tracker which is the address of job tracker
Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Aug 29, 2013 at 10:55 AM, Harsh J ha...@cloudera.com wrote:
 The user1 will mainly require a home directory on the HDFS, created by
 the HDFS administrator user ('hadoop' in your case): sudo -u hadoop
 hadoop fs -mkdir /user/user1; sudo -u hadoop hadoop fs -chown
 user1:user1 /user/user1. After this, the user should be able to run
 jobs and manipulate files in their own directory.

 On Thu, Aug 29, 2013 at 10:21 AM, Hadoop Raj hadoop...@yahoo.com wrote:
 Hi,

 I have a hadoop learning environment on a pseudo distributed mode. It is 
 owned by the user 'hadoop'.

 I am trying to get an understanding on how can another user on this box can 
 act as a Hadoop client and able to create HDFS files and run Map Reduce 
 jobs. Say I have a Linux user 'user1'.

 What permissions , privileges and configuration settings are required for 
 'user1' to act as a Hadoop client?

 Thanks,
 Raj



 --
 Harsh J


Re: Sqoop issue related to Hadoop

2013-08-29 Thread Shekhar Sharma
Go inside the $HADOOP_HOME/log/user/history...
Regards,
Som Shekhar Sharma
+91-8197243810


On Thu, Aug 29, 2013 at 10:13 AM, Hadoop Raj hadoop...@yahoo.com wrote:
 Hi Kate,

 Where can I find the task attempt log? Can you specify the location please?


 Thanks,
 Raj

 On Aug 28, 2013, at 7:13 PM, Kathleen Ting kathl...@apache.org wrote:

 Raj, in addition to what Abe said, please also send the failed task attempt 
 log
 attempt_201307041900_0463_m_00_0 as well.

 Thanks,
 Kate

 On Wed, Aug 28, 2013 at 2:25 PM, Abraham Elmahrek a...@cloudera.com wrote:
 Hey Raj,

 It seems like the number of fields you have in your data doesn't match the
 number of fields in your RAJ.CUSTOMERS table.

 Could you please add --verbose to the beginning of your argument list and
 provide the entire contents here?

 -Abe


 On Wed, Aug 28, 2013 at 9:36 AM, Raj Hadoop hadoop...@yahoo.com wrote:

 Hello all,

 I am getting an error while using sqoop export ( Load HDFS file to Oracle
 ). I am not sure the issue might be a Sqoop or Hadoop related one. So I am
 sending it to both the dist lists.

 I am using -

 sqoop export --connect jdbc:oracle:thin:@//dbserv:9876/OKI --table
 RAJ.CUSTOMERS --export-dir /user/hive/warehouse/web_cust 
 --input-null-string
 '\\N' --input-null-non-string '\\N'  --username  --password  -m 1
 --input-fields-terminated-by '\t'
 I am getting the following error -

 Warning: /usr/lib/hbase does not exist! HBase imports will fail.
 Please set $HBASE_HOME to the root of your HBase installation.
 Warning: $HADOOP_HOME is deprecated.
 13/08/28 09:42:36 WARN tool.BaseSqoopTool: Setting your password on the
 command-line is insecure. Consider using -P instead.
 13/08/28 09:42:36 INFO manager.SqlManager: Using default fetchSize of 1000
 13/08/28 09:42:36 INFO tool.CodeGenTool: Beginning code generation
 13/08/28 09:42:38 INFO manager.OracleManager: Time zone has been set to
 GMT
 13/08/28 09:42:38 INFO manager.SqlManager: Executing SQL statement: SELECT
 t.* FROM RAJ.CUSTOMERS t WHERE 1=0
 13/08/28 09:42:38 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
 /software/hadoop/hadoop/hadoop-1.1.2
 Note:
 /tmp/sqoop-hadoop/compile/c1376f66d2151b48024c54305377c981/RAJ_CUSTOMERS.java
 uses or overrides a deprecated API.
 Note: Recompile with -Xlint:deprecation for details.
 13/08/28 09:42:40 INFO orm.CompilationManager: Writing jar file:
 /tmp/sqoop-hadoop/compile/c1376f66d2151b48024c54305377c981/RAJ.CUSTOMERS.jar
 13/08/28 09:42:40 INFO mapreduce.ExportJobBase: Beginning export of
 RAJ.CUSTOMERS
 13/08/28 09:42:41 INFO manager.OracleManager: Time zone has been set to
 GMT
 13/08/28 09:42:43 INFO input.FileInputFormat: Total input paths to process
 : 1
 13/08/28 09:42:43 INFO input.FileInputFormat: Total input paths to process
 : 1
 13/08/28 09:42:43 INFO util.NativeCodeLoader: Loaded the native-hadoop
 library
 13/08/28 09:42:43 WARN snappy.LoadSnappy: Snappy native library not loaded
 13/08/28 09:42:43 INFO mapred.JobClient: Running job:
 job_201307041900_0463
 13/08/28 09:42:44 INFO mapred.JobClient:  map 0% reduce 0%
 13/08/28 09:42:56 INFO mapred.JobClient:  map 1% reduce 0%
 13/08/28 09:43:00 INFO mapred.JobClient:  map 2% reduce 0%
 13/08/28 09:43:03 INFO mapred.JobClient:  map 4% reduce 0%
 13/08/28 09:43:10 INFO mapred.JobClient:  map 5% reduce 0%
 13/08/28 09:43:13 INFO mapred.JobClient:  map 6% reduce 0%
 13/08/28 09:43:17 INFO mapred.JobClient: Task Id :
 attempt_201307041900_0463_m_00_0, Status : FAILED
 java.io.IOException: Can't export data, please check task tracker logs
at
 org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
at
 org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at
 org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:794)
at RAJ_CUSTOMERS.__loadFromFields(RAJ_CUSTOMERS.java:1057)
at RAJ_CUSTOMERS.parse(RAJ_CUSTOMERS.java:876)
at
 org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:83)
... 10 more

 Thanks,
 Raj







Re: how to find process under node

2013-08-29 Thread Pavan Kumar Polineni
Hi Suneel,

Please provide more details. Like what you want to print and what files you
are using with in the script. So that i can help. May be some thing wrong
in your script. So i want to check from my end and help you on this case.


On Thu, Aug 29, 2013 at 1:10 PM, Shekhar Sharma shekhar2...@gmail.comwrote:

 Are your trying to find the java process under a node...Then simple
 thing would be to do ssh and run jps command to get the list of java
 process
 Regards,
 Som Shekhar Sharma
 +91-8197243810


 On Thu, Aug 29, 2013 at 12:27 PM, suneel hadoop
 suneel.bigd...@gmail.com wrote:
  Hi All,
 
  what im trying out here is to capture the process which is running under
  which node
 
  this is the unix script which i tried
 
 
  !/bin/ksh
 
 
  Cnt=cat /users/hadoop/unixtest/nodefilename.txt | wc -l
  cd /users/hadoop/unixtest/
  ls -ltr | awk '{print $9}'  list_of_scripts.txt
  split -l $Cnt list_of_scripts.txt node_scripts
  ls -ltr node_scripts* | awk '{print $9}'  list_of_node_scripts.txt
  for i in nodefilename.txt
  do
  for j in list_of_node_scripts.txt
  do
  node=$i
  script_file=$j
  cat $node\n $script_file  $script_file
  done
  done
 
 
  exit 0;
 
 
 
  but my result should look like below:
 
 
  node1 node2
  - ---
  process1 proces3
  process2 proces4
 
 
  can some one please help in this..
  thanks in advance..




-- 
 Pavan Kumar Polineni


HBase client with security

2013-08-29 Thread Lanati, Matteo
Hi all,

I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
security.
HBase works if I launch the shell from the node running the master, but I'd 
like to use it from an external machine.
I prepared one, copying the Hadoop and HBase installation folders and adapting 
the path (indeed I can use the same client to run MR jobs and interact with 
HDFS).
Regarding HBase client configuration:

- hbase-site.xml specifies

  property
namehbase.security.authentication/name
valuekerberos/value
  /property
  property
namehbase.rpc.engine/name
valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
  /property
  property
namehbase.zookeeper.quorum/name
valuemaster.hadoop.local,host49.hadoop.local/value
  /property

where the zookeeper hosts are reachable and can be solved via DNS. I had to 
specify them otherwise the shell complains about 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
= ConnectionLoss for /hbase/hbaseid

- I have a keytab for the principal I want to use (user running hbase/my 
client hostname@MYREALM), correctly addressed by the file 
hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
zk-jaas.conf.

Nonetheless, when I issue a command from a HBase shell on the client machine, I 
got an error in the HBase master log

2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
listener on 6: readAndProcess threw exception 
org.apache.hadoop.security.AccessControlException: Authentication is required. 
Count of bytes read: 0
org.apache.hadoop.security.AccessControlException: Authentication is required
at 
org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

It looks like there's a mismatch between the client and the master regarding 
the authentication mechanism. Note that from the same client machine I can 
launch and use a Zookeeper shell.
What am I missing in the client configuration? Does /etc/krb5.conf play any 
role into this?
Thanks,

Matteo


Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748   Garching b. München (Germany)
Phone: +49 89 35831 8724



Re: Tutorials that work with modern Hadoop (v1.x.y)?

2013-08-29 Thread Olivier Renault
Have you tried the hortonworks sandbox? It's a self contained Hadoop
environment with dataset + tutorials  (10ish) on Hive  Pig.

Thanks
Olivier
On 27 Aug 2013 15:53, Andrew Pennebaker apenneba...@42six.com wrote:

 There are a number of Hadoop tutorials and textbooks available, but they
 always seem to target older versions of Hadoop. Does anyone know of good
 tutorials that work with modern Hadoop verions (v1.x.y)?


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Fwd: Pig GROUP operator - Data is shuffled and wind up together for the same grouping key

2013-08-29 Thread Viswanathan J
Appreciate the response.  I'm facing this issue in prod.

-- Forwarded message --
From: Viswanathan J jayamviswanat...@gmail.com
Date: Thu, Aug 29, 2013 at 2:00 PM
Subject: Pig GROUP operator - Data is shuffled and wind up together for the
same grouping key
To: u...@pig.apache.org u...@pig.apache.org


Hi,

I'm using pig version 0.11.0

While using GROUP operator in Pig all the data is shuffled, so that rows in
different partitions that have the same grouping key wind up together and
got wrong results for grouping.

While storing the result data, it is share work between multiple
calculations.

How to solve this? Please advice.

-- 
Regards,
Viswa.J



-- 
Regards,
Viswa.J


Re: How to pass parameter to mappers

2013-08-29 Thread Mohammad Tariq
@rab ra : Last line of the para.

An example :

*Job Setup -*
Configuration conf = new Configuration();
conf.set(param, value);
Job job = new Job(conf);

*Inside mapper -*
Configuration conf = context.getConfiguration();
String paramValue = conf.get(param);

HTH

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 7:05 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 See here:
 http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Configuration

 Regards,
 Shahab


 On Wed, Aug 28, 2013 at 7:59 AM, rab ra rab...@gmail.com wrote:

 Hello

 Any hint on how to pass parameters to mappers in 1.2.1 hadoop release?





Re: Simplifying MapReduce API

2013-08-29 Thread Mohammad Tariq
Just to add to the above comments, you just have to extend the classes *
Mapper* and *Reducer* as per the new API.

Warm Regards,
Tariq
cloudfront.blogspot.com


On Wed, Aug 28, 2013 at 1:26 AM, Don Nelson dieseld...@gmail.com wrote:

 I agree with @Shahab - it's simple enough to declare both interfaces in
 one class if that's what you want to do.  But given the distributed
 behavior of Hadoop, it's likely that your mappers will be running on
 different nodes than your reducers anyway - why ship around duplicate code?


 On Tue, Aug 27, 2013 at 9:48 AM, Shahab Yunus shahab.yu...@gmail.comwrote:

 For starters (experts might have more complex reasons), what if your
 respective map and reduce logic becomes complex enough to demand separate
 classes? Why tie the clients to implement both by moving these in one Job
 interface. In the current design you can always implement both (map and
 reduce) interfaces if your logic is simple enough and go the other route,
 of separate classes if that is required. I think it is more flexible this
 way (you can always build up from and on top of granular design, rather
 than other way around.)

 I hope I understood your concern correctly...

 Regards,
 Shahab


 On Tue, Aug 27, 2013 at 11:35 AM, Andrew Pennebaker 
 apenneba...@42six.com wrote:

 There seems to be an abundance of boilerplate patterns in MapReduce:

 * Write a class extending Map (1), implementing Mapper (2), with a map
 method (3)
 * Write a class extending Reduce (4), implementing Reducer (5), with a
 reduce method (6)

 Could we achieve the same behavior with a single Job interface requiring
 map() and reduce() methods?





 --

 A child of five could understand this.  Fetch me a child of five.



Re: Tutorials that work with modern Hadoop (v1.x.y)?

2013-08-29 Thread Andrew Pennebaker
In the mean time, I was able to cobble together a working wordcount job.
Hardest parts were installing hadoop and configuring the classpath.

https://github.com/mcandre/hadoop-docs-tutorial#hadoop-docs-tutorial---distributed-wc


On Thu, Aug 29, 2013 at 4:44 AM, Olivier Renault
orena...@hortonworks.comwrote:

 Have you tried the hortonworks sandbox? It's a self contained Hadoop
 environment with dataset + tutorials  (10ish) on Hive  Pig.

 Thanks
 Olivier
 On 27 Aug 2013 15:53, Andrew Pennebaker apenneba...@42six.com wrote:

 There are a number of Hadoop tutorials and textbooks available, but they
 always seem to target older versions of Hadoop. Does anyone know of good
 tutorials that work with modern Hadoop verions (v1.x.y)?


 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


RE: HBase client with security

2013-08-29 Thread Devaraj k
Please ask this question in u...@hbase.apache.org, you would get better 
response there. 

Thanks
Devaraj k


-Original Message-
From: Lanati, Matteo [mailto:matteo.lan...@lrz.de] 
Sent: 29 August 2013 14:03
To: user@hadoop.apache.org
Subject: HBase client with security

Hi all,

I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
security.
HBase works if I launch the shell from the node running the master, but I'd 
like to use it from an external machine.
I prepared one, copying the Hadoop and HBase installation folders and adapting 
the path (indeed I can use the same client to run MR jobs and interact with 
HDFS).
Regarding HBase client configuration:

- hbase-site.xml specifies

  property
namehbase.security.authentication/name
valuekerberos/value
  /property
  property
namehbase.rpc.engine/name
valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
  /property
  property
namehbase.zookeeper.quorum/name
valuemaster.hadoop.local,host49.hadoop.local/value
  /property

where the zookeeper hosts are reachable and can be solved via DNS. I had to 
specify them otherwise the shell complains about 
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode 
= ConnectionLoss for /hbase/hbaseid

- I have a keytab for the principal I want to use (user running hbase/my 
client hostname@MYREALM), correctly addressed by the file 
hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
zk-jaas.conf.

Nonetheless, when I issue a command from a HBase shell on the client machine, I 
got an error in the HBase master log

2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
listener on 6: readAndProcess threw exception 
org.apache.hadoop.security.AccessControlException: Authentication is required. 
Count of bytes read: 0
org.apache.hadoop.security.AccessControlException: Authentication is required
at 
org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

It looks like there's a mismatch between the client and the master regarding 
the authentication mechanism. Note that from the same client machine I can 
launch and use a Zookeeper shell.
What am I missing in the client configuration? Does /etc/krb5.conf play any 
role into this?
Thanks,

Matteo


Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748   Garching b. München (Germany)
Phone: +49 89 35831 8724



Re: HBase client with security

2013-08-29 Thread Lanati, Matteo
Hi Devaraj,

you're right, I just subscribed, sorry for the spam.

Matteo


On Aug 29, 2013, at 3:31 PM, Devaraj k devara...@huawei.com wrote:

 Please ask this question in u...@hbase.apache.org, you would get better 
 response there. 
 
 Thanks
 Devaraj k
 
 
 -Original Message-
 From: Lanati, Matteo [mailto:matteo.lan...@lrz.de] 
 Sent: 29 August 2013 14:03
 To: user@hadoop.apache.org
 Subject: HBase client with security
 
 Hi all,
 
 I set up Hadoop (1.2.0), Zookeeper (3.4.5) and HBase (0.94.8-security) with 
 security.
 HBase works if I launch the shell from the node running the master, but I'd 
 like to use it from an external machine.
 I prepared one, copying the Hadoop and HBase installation folders and 
 adapting the path (indeed I can use the same client to run MR jobs and 
 interact with HDFS).
 Regarding HBase client configuration:
 
 - hbase-site.xml specifies
 
  property
namehbase.security.authentication/name
valuekerberos/value
  /property
  property
namehbase.rpc.engine/name
valueorg.apache.hadoop.hbase.ipc.SecureRpcEngine/value
  /property
  property
namehbase.zookeeper.quorum/name
valuemaster.hadoop.local,host49.hadoop.local/value
  /property
 
 where the zookeeper hosts are reachable and can be solved via DNS. I had to 
 specify them otherwise the shell complains about 
 org.apache.zookeeper.KeeperException$ConnectionLossException: 
 KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
 
 - I have a keytab for the principal I want to use (user running hbase/my 
 client hostname@MYREALM), correctly addressed by the file 
 hbase/conf/zk-jaas.conf. In hbase-env.sh, the variable HBASE_OPTS points to 
 zk-jaas.conf.
 
 Nonetheless, when I issue a command from a HBase shell on the client machine, 
 I got an error in the HBase master log
 
 2013-08-29 10:11:30,890 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server 
 listener on 6: readAndProcess threw exception 
 org.apache.hadoop.security.AccessControlException: Authentication is 
 required. Count of bytes read: 0
 org.apache.hadoop.security.AccessControlException: Authentication is required
   at 
 org.apache.hadoop.hbase.ipc.SecureServer$SecureConnection.readAndProcess(SecureServer.java:435)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener.doRead(HBaseServer.java:748)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.doRunLoop(HBaseServer.java:539)
   at 
 org.apache.hadoop.hbase.ipc.HBaseServer$Listener$Reader.run(HBaseServer.java:514)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
 
 It looks like there's a mismatch between the client and the master regarding 
 the authentication mechanism. Note that from the same client machine I can 
 launch and use a Zookeeper shell.
 What am I missing in the client configuration? Does /etc/krb5.conf play any 
 role into this?
 Thanks,
 
 Matteo
 
 
 Matteo Lanati
 Distributed Resources Group
 Leibniz-Rechenzentrum (LRZ)
 Boltzmannstrasse 1
 85748 Garching b. München (Germany)
 Phone: +49 89 35831 8724
 

Matteo Lanati
Distributed Resources Group
Leibniz-Rechenzentrum (LRZ)
Boltzmannstrasse 1
85748   Garching b. München (Germany)
Phone: +49 89 35831 8724



Re: Hadoop Yarn - samples

2013-08-29 Thread Arun C Murthy
Take a look at the dist-shell example in 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/

I recently wrote up another simplified version of it for illustration purposes 
here: https://github.com/hortonworks/simple-yarn-app

Arun

On Aug 28, 2013, at 4:47 AM, Manickam P manicka...@outlook.com wrote:

 Hi,
 
 I have just installed Hadoop 2.0.5 alpha version. 
 I want to analyse how the Yarn resource manager and node mangers works. 
 I executed the map reduce examples but i want to execute the samples in Yarn. 
 Searching for that but unable to find any.  Please help me.
 
 
 
 Thanks,
 Manickam P

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Hadoop Yarn - samples

2013-08-29 Thread Punnoose, Roshan
Is there an example of running a sample yarn application that will only allow 
one container to start per host?

Punnoose, Roshan
rashan.punnr...@merck.commailto:rashan.punnr...@merck.com



On Aug 29, 2013, at 10:08 AM, Arun C Murthy 
a...@hortonworks.commailto:a...@hortonworks.com wrote:

Take a look at the dist-shell example in 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/

I recently wrote up another simplified version of it for illustration purposes 
here: https://github.com/hortonworks/simple-yarn-app

Arun

On Aug 28, 2013, at 4:47 AM, Manickam P 
manicka...@outlook.commailto:manicka...@outlook.com wrote:

Hi,

I have just installed Hadoop 2.0.5 alpha version.
I want to analyse how the Yarn resource manager and node mangers works.
I executed the map reduce examples but i want to execute the samples in Yarn. 
Searching for that but unable to find any.  Please help me.



Thanks,
Manickam P

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at 
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from 
your system.


Is hadoop tread safe?

2013-08-29 Thread Pavan Sudheendra
Hi all,

Is hadoop thread safe? Do mappers make use of threads in any chance? A
little bit of information on the way they execute in parallel would help me
out. Thanks.

Regards,
Pavan


Re: Is hadoop tread safe?

2013-08-29 Thread Adam Muise
Mappers don't communicate with each other in traditional MapReduce. If you
need something more MPI-ish then look to MPI over YARN or write your own
YARN app.

If you need multi-threading within the mapper then it is up to you as the
java developer to make it thread safe. Use the concurrent libraries like
anything else and Bob's your uncle. Having overly-complicated mappers can
be difficult to manage however and it kind of misses the mark for MapReduce
problems.

Maybe if you expand on your use case a bit someone here can provide
specific advice.


On Thu, Aug 29, 2013 at 10:33 AM, Pavan Sudheendra pavan0...@gmail.comwrote:

 Hi all,

 Is hadoop thread safe? Do mappers make use of threads in any chance? A
 little bit of information on the way they execute in parallel would help me
 out. Thanks.

 Regards,
 Pavan




-- 
*
*
*
*
*Adam Muise*
Solution Engineer
*Hortonworks*
amu...@hortonworks.com
416-417-4037

Hortonworks - Develops, Distributes and Supports Enterprise Apache
Hadoop.http://hortonworks.com/

Hortonworks Virtual Sandbox http://hortonworks.com/sandbox

Hadoop: Disruptive Possibilities by Jeff
Needhamhttp://hortonworks.com/resources/?did=72cat=1

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: Is hadoop tread safe?

2013-08-29 Thread Pavan Sudheendra
No, I had written a huge Map Reduce program which talks with hbase and does
a lot of computing using it as a source as well as sink.. One of my
colleague saw my code and saw that I had used a lot of static function
instead of making use of proper OOP concepts.. He was telling me that it
shouldn't be the way I should go about doing it.. But my code works fine..
So, was wondering will I face any problem in the future because of this..
That's all.

Regards,
Pavan
On Aug 29, 2013 8:11 PM, Adam Muise amu...@hortonworks.com wrote:

 Mappers don't communicate with each other in traditional MapReduce. If you
 need something more MPI-ish then look to MPI over YARN or write your own
 YARN app.

 If you need multi-threading within the mapper then it is up to you as the
 java developer to make it thread safe. Use the concurrent libraries like
 anything else and Bob's your uncle. Having overly-complicated mappers can
 be difficult to manage however and it kind of misses the mark for MapReduce
 problems.

 Maybe if you expand on your use case a bit someone here can provide
 specific advice.


 On Thu, Aug 29, 2013 at 10:33 AM, Pavan Sudheendra pavan0...@gmail.comwrote:

 Hi all,

 Is hadoop thread safe? Do mappers make use of threads in any chance? A
 little bit of information on the way they execute in parallel would help me
 out. Thanks.

 Regards,
 Pavan




 --
 *
 *
 *
 *
 *Adam Muise*
 Solution Engineer
 *Hortonworks*
 amu...@hortonworks.com
 416-417-4037

 Hortonworks - Develops, Distributes and Supports Enterprise Apache 
 Hadoop.http://hortonworks.com/

 Hortonworks Virtual Sandbox http://hortonworks.com/sandbox

 Hadoop: Disruptive Possibilities by Jeff 
 Needhamhttp://hortonworks.com/resources/?did=72cat=1

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Re: reading input stream

2013-08-29 Thread jamal sasha
Wait.. this is something new to me..
This goes is driver setup??? mapper??
can you elaborate a bit on this??


On Thu, Aug 29, 2013 at 12:43 AM, Shekhar Sharma shekhar2...@gmail.comwrote:

 Path p = new Path(path of the file which youwould like to read from
 HDFS);
 FSDataInputStream iStream = FileSystem.open(p);
 String str;
 while((str = iStream.readLine())!=null)
 {
 System.out.printn(str);

 }
 Regards,
 Som Shekhar Sharma
 +91-8197243810


 On Thu, Aug 29, 2013 at 12:15 PM, jamal sasha jamalsha...@gmail.com
 wrote:
  Hi,
Probably a very stupid question.
  I have this data in binary format... and the following piece of code
 works
  for me in normal java.
 
 
  public classparser {
 
  public static void main(String [] args) throws Exception{
  String filename = sample.txt;
  File file = new File(filename);
  FileInputStream fis = new FileInputStream(filename);
  System.out.println(Total file size to read (in bytes) : 
  + fis.available());
  BSONDecoder bson = new BSONDecoder();
  System.out.println(bson.readObject(fis));
  }
  }
 
 
  Now finally the last line is the answer..
  Now, I want to implement this on hadoop but the challenge (which I think)
  is.. that I am not reading or parsing data line by line.. rather its a
  stream of data??? right??
  How do i replicate the above code logic.. but in hadoop?



[yarn] job is not getting assigned

2013-08-29 Thread Andre Kelpe
Hi,

I am in the middle of setting up a hadoop 2 cluster. I am using the hadoop
2.1-beta tarball.

My cluster has 1 master node running the hdfs namenode, the resourcemanger
and the job history server. Next to that I have  3 nodes acting as
datanodes and nodemanagers.

In order to test, if everything is working, I submitted the teragen job
from the hadoop-examples jar like this:

$ hadoop jar
$HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar
teragen 1000 /user/vagrant/teragen

The job starts up and I  get the following output:

13/08/29 14:42:46 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
13/08/29 14:42:47 INFO client.RMProxy: Connecting to ResourceManager at
master.local/192.168.7.10:8032
13/08/29 14:42:48 INFO terasort.TeraSort: Generating 1000 using 2
13/08/29 14:42:48 INFO mapreduce.JobSubmitter: number of splits:2
13/08/29 14:42:48 WARN conf.Configuration: user.name is deprecated.
Instead, use mapreduce.job.user.name
13/08/29 14:42:48 WARN conf.Configuration: mapred.jar is deprecated.
Instead, use mapreduce.job.jar
13/08/29 14:42:48 WARN conf.Configuration: mapred.reduce.tasks is
deprecated. Instead, use mapreduce.job.reduces
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.value.class is
deprecated. Instead, use mapreduce.job.output.value.class
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.map.class is
deprecated. Instead, use mapreduce.job.map.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.job.name is deprecated.
Instead, use mapreduce.job.name
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.inputformat.class is
deprecated. Instead, use mapreduce.job.inputformat.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.dir is deprecated.
Instead, use mapreduce.output.fileoutputformat.outputdir
13/08/29 14:42:48 WARN conf.Configuration: mapreduce.outputformat.class is
deprecated. Instead, use mapreduce.job.outputformat.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.map.tasks is deprecated.
Instead, use mapreduce.job.maps
13/08/29 14:42:48 WARN conf.Configuration: mapred.output.key.class is
deprecated. Instead, use mapreduce.job.output.key.class
13/08/29 14:42:48 WARN conf.Configuration: mapred.working.dir is
deprecated. Instead, use mapreduce.job.working.dir
13/08/29 14:42:49 INFO mapreduce.JobSubmitter: Submitting tokens for job:
job_1377787324271_0001
13/08/29 14:42:50 INFO impl.YarnClientImpl: Submitted application
application_1377787324271_0001 to ResourceManager at master.local/
192.168.7.10:8032
13/08/29 14:42:50 INFO mapreduce.Job: The url to track the job:
http://master.local:8088/proxy/application_1377787324271_0001/
13/08/29 14:42:50 INFO mapreduce.Job: Running job: job_1377787324271_0001

and then it stops. If I check the UI, I see this:

application_1377787324271_0001http://master.local:8088/cluster/app/application_1377787324271_0001
vagrantTeraGenMAPREDUCEdefaultThu, 29 Aug 2013 14:42:49 GMTN/AACCEPTED
UNDEFINED

UNASSIGNED http://master.local:8088/cluster/apps#
I have no idea, why it is not starting, nor what to look for. Any pointers
are more than welcome!

Thanks!

- André

-- 
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com


RE: Hadoop Yarn - samples

2013-08-29 Thread Manickam P
Hi Arun,
Thanks for your reply. Actually i've installed apache hadoop. The samples you 
shared looks like hortonworks so will it work fine for me? I got a doubt on 
this so asking here. 

Thanks,
Manickam P

From: a...@hortonworks.com
Subject: Re: Hadoop Yarn - samples
Date: Thu, 29 Aug 2013 07:08:00 -0700
To: user@hadoop.apache.org

Take a look at the dist-shell example in 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/
I recently wrote up another simplified version of it for illustration purposes 
here: https://github.com/hortonworks/simple-yarn-app
Arun
On Aug 28, 2013, at 4:47 AM, Manickam P manicka...@outlook.com wrote:Hi,
I have just installed Hadoop 2.0.5 alpha version. I want to analyse how the 
Yarn resource manager and node mangers works. I executed the map reduce 
examples but i want to execute the samples in Yarn. Searching for that but 
unable to find any.  Please help me.


Thanks,
Manickam P

--Arun C. MurthyHortonworks Inc.
http://hortonworks.com/







CONFIDENTIALITY NOTICENOTICE: This message is intended for the use of the 
individual or entity to which it is addressed and may contain information that 
is confidential, privileged and exempt from disclosure under applicable law. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have received 
this communication in error, please contact the sender immediately and delete 
it from your system. Thank You. 

Re: Is hadoop tread safe?

2013-08-29 Thread Harsh J
Map tasks run in parallel spawned JVMs, so are isolated from one
another at runtime. Use of static functions shouldn't affect you
generally.

Default Map I/O is single-threaded. If you plan to use
multiple-threads, use MultiThreadedMapper for proper thread-safety.

On Thu, Aug 29, 2013 at 8:15 PM, Pavan Sudheendra pavan0...@gmail.com wrote:
 No, I had written a huge Map Reduce program which talks with hbase and does
 a lot of computing using it as a source as well as sink.. One of my
 colleague saw my code and saw that I had used a lot of static function
 instead of making use of proper OOP concepts.. He was telling me that it
 shouldn't be the way I should go about doing it.. But my code works fine..
 So, was wondering will I face any problem in the future because of this..
 That's all.

 Regards,
 Pavan

 On Aug 29, 2013 8:11 PM, Adam Muise amu...@hortonworks.com wrote:

 Mappers don't communicate with each other in traditional MapReduce. If you
 need something more MPI-ish then look to MPI over YARN or write your own
 YARN app.

 If you need multi-threading within the mapper then it is up to you as the
 java developer to make it thread safe. Use the concurrent libraries like
 anything else and Bob's your uncle. Having overly-complicated mappers can be
 difficult to manage however and it kind of misses the mark for MapReduce
 problems.

 Maybe if you expand on your use case a bit someone here can provide
 specific advice.


 On Thu, Aug 29, 2013 at 10:33 AM, Pavan Sudheendra pavan0...@gmail.com
 wrote:

 Hi all,

 Is hadoop thread safe? Do mappers make use of threads in any chance? A
 little bit of information on the way they execute in parallel would help me
 out. Thanks.

 Regards,
 Pavan




 --


 Adam Muise
 Solution Engineer
 Hortonworks
 amu...@hortonworks.com
 416-417-4037

 Hortonworks - Develops, Distributes and Supports Enterprise Apache Hadoop.

 Hortonworks Virtual Sandbox

 Hadoop: Disruptive Possibilities by Jeff Needham

 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.



-- 
Harsh J


Re: Hadoop client user

2013-08-29 Thread Raj Hadoop


Thanks Harsh. That is a very good explanation.

I am trying to understand how in a production cluster, hadoop user and hadoop 
clients would be. 

What users should exist in NN, JT, DN ? 

Regards,
Rajendra



 From: Harsh J ha...@cloudera.com
To: user@hadoop.apache.org user@hadoop.apache.org 
Sent: Thursday, August 29, 2013 1:25 AM
Subject: Re: Hadoop client user
 

The user1 will mainly require a home directory on the HDFS, created by
the HDFS administrator user ('hadoop' in your case): sudo -u hadoop
hadoop fs -mkdir /user/user1; sudo -u hadoop hadoop fs -chown
user1:user1 /user/user1. After this, the user should be able to run
jobs and manipulate files in their own directory.

On Thu, Aug 29, 2013 at 10:21 AM, Hadoop Raj hadoop...@yahoo.com wrote:
 Hi,

 I have a hadoop learning environment on a pseudo distributed mode. It is 
 owned by the user 'hadoop'.

 I am trying to get an understanding on how can another user on this box can 
 act as a Hadoop client and able to create HDFS files and run Map Reduce jobs. 
 Say I have a Linux user 'user1'.

 What permissions , privileges and configuration settings are required for 
 'user1' to act as a Hadoop client?

 Thanks,
 Raj



-- 
Harsh J

Re: [yarn] job is not getting assigned

2013-08-29 Thread Vinod Kumar Vavilapalli

This usually means there are no available resources as seen by the 
ResourceManager. Do you see Active Nodes on the RM web UI first page? If not, 
you'll have to check the NodeManager logs to see if they crashed for some 
reason.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Aug 29, 2013, at 7:52 AM, Andre Kelpe wrote:

 Hi,
 
 I am in the middle of setting up a hadoop 2 cluster. I am using the hadoop 
 2.1-beta tarball. 
 
 My cluster has 1 master node running the hdfs namenode, the resourcemanger 
 and the job history server. Next to that I have  3 nodes acting as datanodes 
 and nodemanagers.
 
 In order to test, if everything is working, I submitted the teragen job from 
 the hadoop-examples jar like this:
 
 $ hadoop jar 
 $HADOOP_PREFIX/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.1.0-beta.jar
  teragen 1000 /user/vagrant/teragen
 
 The job starts up and I  get the following output:
 
 13/08/29 14:42:46 WARN util.NativeCodeLoader: Unable to load native-hadoop 
 library for your platform... using builtin-java classes where applicable
 13/08/29 14:42:47 INFO client.RMProxy: Connecting to ResourceManager at 
 master.local/192.168.7.10:8032
 13/08/29 14:42:48 INFO terasort.TeraSort: Generating 1000 using 2
 13/08/29 14:42:48 INFO mapreduce.JobSubmitter: number of splits:2
 13/08/29 14:42:48 WARN conf.Configuration: user.name is deprecated. Instead, 
 use mapreduce.job.user.name
 13/08/29 14:42:48 WARN conf.Configuration: mapred.jar is deprecated. Instead, 
 use mapreduce.job.jar
 13/08/29 14:42:48 WARN conf.Configuration: mapred.reduce.tasks is deprecated. 
 Instead, use mapreduce.job.reduces
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.value.class is 
 deprecated. Instead, use mapreduce.job.output.value.class
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.map.class is deprecated. 
 Instead, use mapreduce.job.map.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.job.name is deprecated. 
 Instead, use mapreduce.job.name
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.inputformat.class is 
 deprecated. Instead, use mapreduce.job.inputformat.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.dir is deprecated. 
 Instead, use mapreduce.output.fileoutputformat.outputdir
 13/08/29 14:42:48 WARN conf.Configuration: mapreduce.outputformat.class is 
 deprecated. Instead, use mapreduce.job.outputformat.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.map.tasks is deprecated. 
 Instead, use mapreduce.job.maps
 13/08/29 14:42:48 WARN conf.Configuration: mapred.output.key.class is 
 deprecated. Instead, use mapreduce.job.output.key.class
 13/08/29 14:42:48 WARN conf.Configuration: mapred.working.dir is deprecated. 
 Instead, use mapreduce.job.working.dir
 13/08/29 14:42:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
 job_1377787324271_0001
 13/08/29 14:42:50 INFO impl.YarnClientImpl: Submitted application 
 application_1377787324271_0001 to ResourceManager at 
 master.local/192.168.7.10:8032
 13/08/29 14:42:50 INFO mapreduce.Job: The url to track the job: 
 http://master.local:8088/proxy/application_1377787324271_0001/
 13/08/29 14:42:50 INFO mapreduce.Job: Running job: job_1377787324271_0001
 
 and then it stops. If I check the UI, I see this:
 
 application_1377787324271_0001vagrant TeraGen MAPREDUCE   default 
 Thu, 29 Aug 2013 14:42:49 GMT   N/A ACCEPTEDUNDEFINED   
 
 UNASSIGNED
 
 I have no idea, why it is not starting, nor what to look for. Any pointers 
 are more than welcome!
 
 Thanks!
 
 - André
 
 -- 
 André Kelpe
 an...@concurrentinc.com
 http://concurrentinc.com


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Multidata center support

2013-08-29 Thread Baskar Duraikannu
We have a need to setup hadoop across data centers.  Does hadoop support multi 
data center configuration? I searched through archives and have found that 
hadoop did not support multi data center configuration some time back. Just 
wanted to see whether situation has changed.
Please help.  

Hadoop HA error JOURNAL is not supported in state standby

2013-08-29 Thread orahad bigdata
Hi,

I'm facing an error while starting Hadoop in HA(2.0.5) cluster , both
the NameNode started in standby mode and not changing the state.

When I tried to do health check through  hdfs haadmin -checkhealth
service id  it's giving me below error.

Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Message missing
required fields: callId, status; Host Details : local host is:
clone2/XX.XX.XX.XX; destination host is: clone1:8020;

 I checked the logs at NN side.

2013-08-30 00:49:16,074 ERROR
org.apache.hadoop.security.UserGroupInformation:
PriviledgedActionException as:hadoop (auth:SIMPLE)
cause:org.apache.hadoop.ipc.StandbyException: Operation category
JOURNAL is not supported in state standby
2013-08-30 00:49:16,074 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 8020, call
org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog
from 192.168.126.31:48266: error:
org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is
not supported in state standby
2013-08-30 00:49:32,391 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering
log roll on remote NameNode clone2:8020
2013-08-30 00:49:32,403 WARN
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to
trigger a roll of the active NN
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category JOURNAL is not supported in state standby
at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1411)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:859)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4445)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:766)
at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
at 
org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8758)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

at org.apache.hadoop.ipc.Client.call(Client.java:1235)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
at $Proxy11.rollEditLog(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
at 
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
at 
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

Did I missed something?

Thanks


copy files from hdfs to local fs

2013-08-29 Thread Chengi Liu
Ok,

  A very stupid question...

I have a large file in

/user/input/foo.txt

I want to copy first 100 lines from this location to local filesystem...

And the data is very sensitive so i am bit hesistant to experiment.

What is the right way to copy sample data from hdfs to local fs.


Re: Hadoop HA error JOURNAL is not supported in state standby

2013-08-29 Thread orahad bigdata
Thanks  Harsh,

I don't have auto failover configuration, but also I have tried to do
this manually but didn't get success.

hdfs haadmin -transitionToActive node1

Failed on local exception:
com.google.protobuf.InvalidProtocolBufferException: Message missing
required
fields: callId, status; Host Details : local host is: clone2/XX.XX.XX.XX;
destination host is: clone1:8020;

So is there any alternative to resolve this issue?.

Thanks

On 8/30/13, Harsh J ha...@cloudera.com wrote:
 On the actual issue though: Do you also have auto-failover configured?

 On Fri, Aug 30, 2013 at 1:39 AM, orahad bigdata oracle...@gmail.com
 wrote:
 Hi,

 I'm facing an error while starting Hadoop in HA(2.0.5) cluster , both
 the NameNode started in standby mode and not changing the state.

 When I tried to do health check through  hdfs haadmin -checkhealth
 service id  it's giving me below error.

 Failed on local exception:
 com.google.protobuf.InvalidProtocolBufferException: Message missing
 required fields: callId, status; Host Details : local host is:
 clone2/XX.XX.XX.XX; destination host is: clone1:8020;

  I checked the logs at NN side.

 2013-08-30 00:49:16,074 ERROR
 org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException as:hadoop (auth:SIMPLE)
 cause:org.apache.hadoop.ipc.StandbyException: Operation category
 JOURNAL is not supported in state standby
 2013-08-30 00:49:16,074 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 1 on 8020, call
 org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog
 from 192.168.126.31:48266: error:
 org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is
 not supported in state standby
 2013-08-30 00:49:32,391 INFO
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering
 log roll on remote NameNode clone2:8020
 2013-08-30 00:49:32,403 WARN
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to
 trigger a roll of the active NN
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
 Operation category JOURNAL is not supported in state standby
 at
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1411)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:859)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4445)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:766)
 at
 org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
 at
 org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8758)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

 at org.apache.hadoop.ipc.Client.call(Client.java:1235)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy11.rollEditLog(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
 at
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)

 Did I missed something?

 Thanks



 --
 Harsh J



Re: Hadoop HA error JOURNAL is not supported in state standby

2013-08-29 Thread Jing Zhao
Looks like you have some incompatibility between your client side and
the server side? Are you also running 2.0.5 in your client side?

As Harsh mentioned, the NN side warning msg is not related to your
InvalidProtocolBufferException. The warning msg indicates that both of
your NN are in the Standby state.

Thanks,
-Jing

On Thu, Aug 29, 2013 at 1:36 PM, orahad bigdata oracle...@gmail.com wrote:
 Thanks  Harsh,

 I don't have auto failover configuration, but also I have tried to do
 this manually but didn't get success.

 hdfs haadmin -transitionToActive node1

 Failed on local exception:
 com.google.protobuf.InvalidProtocolBufferException: Message missing
 required
 fields: callId, status; Host Details : local host is: clone2/XX.XX.XX.XX;
 destination host is: clone1:8020;

 So is there any alternative to resolve this issue?.

 Thanks

 On 8/30/13, Harsh J ha...@cloudera.com wrote:
 On the actual issue though: Do you also have auto-failover configured?

 On Fri, Aug 30, 2013 at 1:39 AM, orahad bigdata oracle...@gmail.com
 wrote:
 Hi,

 I'm facing an error while starting Hadoop in HA(2.0.5) cluster , both
 the NameNode started in standby mode and not changing the state.

 When I tried to do health check through  hdfs haadmin -checkhealth
 service id  it's giving me below error.

 Failed on local exception:
 com.google.protobuf.InvalidProtocolBufferException: Message missing
 required fields: callId, status; Host Details : local host is:
 clone2/XX.XX.XX.XX; destination host is: clone1:8020;

  I checked the logs at NN side.

 2013-08-30 00:49:16,074 ERROR
 org.apache.hadoop.security.UserGroupInformation:
 PriviledgedActionException as:hadoop (auth:SIMPLE)
 cause:org.apache.hadoop.ipc.StandbyException: Operation category
 JOURNAL is not supported in state standby
 2013-08-30 00:49:16,074 INFO org.apache.hadoop.ipc.Server: IPC Server
 handler 1 on 8020, call
 org.apache.hadoop.hdfs.server.protocol.NamenodeProtocol.rollEditLog
 from 192.168.126.31:48266: error:
 org.apache.hadoop.ipc.StandbyException: Operation category JOURNAL is
 not supported in state standby
 2013-08-30 00:49:32,391 INFO
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering
 log roll on remote NameNode clone2:8020
 2013-08-30 00:49:32,403 WARN
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Unable to
 trigger a roll of the active NN
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
 Operation category JOURNAL is not supported in state standby
 at
 org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1411)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:859)
 at
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.rollEditLog(FSNamesystem.java:4445)
 at
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.rollEditLog(NameNodeRpcServer.java:766)
 at
 org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolServerSideTranslatorPB.rollEditLog(NamenodeProtocolServerSideTranslatorPB.java:139)
 at
 org.apache.hadoop.hdfs.protocol.proto.NamenodeProtocolProtos$NamenodeProtocolService$2.callBlockingMethod(NamenodeProtocolProtos.java:8758)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

 at org.apache.hadoop.ipc.Client.call(Client.java:1235)
 at
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
 at $Proxy11.rollEditLog(Unknown Source)
 at
 org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:139)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.triggerActiveLogRoll(EditLogTailer.java:268)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.access$600(EditLogTailer.java:61)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:310)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
 at
 org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
 at
 org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
 at
 

Re: copy files from hdfs to local fs

2013-08-29 Thread Kim Chew
hadoop fs -copyToLocal
or
hadoop fs -get

It copies the whole file and won't be able just to copy part of the file,
what is interesting is there is a tail command but no head.

Kim


On Thu, Aug 29, 2013 at 1:35 PM, Chengi Liu chengi.liu...@gmail.com wrote:

 Ok,

   A very stupid question...

 I have a large file in

 /user/input/foo.txt

 I want to copy first 100 lines from this location to local filesystem...

 And the data is very sensitive so i am bit hesistant to experiment.

 What is the right way to copy sample data from hdfs to local fs.




Re: copy files from hdfs to local fs

2013-08-29 Thread Chengi Liu
tail will work as well.. ??? but i want to extract just (say) n lines out
of this file?


On Thu, Aug 29, 2013 at 1:43 PM, Kim Chew kchew...@gmail.com wrote:

 hadoop fs -copyToLocal
 or
 hadoop fs -get

 It copies the whole file and won't be able just to copy part of the file,
 what is interesting is there is a tail command but no head.

 Kim


 On Thu, Aug 29, 2013 at 1:35 PM, Chengi Liu chengi.liu...@gmail.comwrote:

 Ok,

   A very stupid question...

 I have a large file in

 /user/input/foo.txt

 I want to copy first 100 lines from this location to local filesystem...

 And the data is very sensitive so i am bit hesistant to experiment.

 What is the right way to copy sample data from hdfs to local fs.





Hadoop Yarn

2013-08-29 Thread Rajesh Jain
I have some jvm options which i want to configure only for a few nodes in the 
cluster using Hadoop yarn. How do i di it. If i edit the mapred-site.xml it 
gets applied to all the task jvms. I just want handful of map jvms to have that 
option and other map jvm not have that options. 

Thanks
Rajesh

Sent from my iPhone

TB per core sweet spot

2013-08-29 Thread Xuri Nagarin
Hi,

I realize there is no perfect spec for data nodes as lot depends on use
cases and work loads but I am curious if there are any rules of thumb or
no-go zones in terms of how many terabytes per core is ok?

So a few questions assuming 1 core per hdd holds:
Is there a no-go zone in terms of tb/core? I ask because I am seeing
4TB/core nodes in some of the clusters and wondering if that's too much?
Does tb/core depend on the core speed? For example, while a 1.8Ghz might be
able to handle 1TB, going to 4TB requires a 3.6Ghz E5 Xeon core?
Dramatic difference between Xeon E3 vs E5 or incremental?
Any comments on disk choice - SATA vs SAS, 5.9k vs 7.2k vs 10k, SATA2 vs 3?

Again, I realize there is a huge YMMV factor here but I would love to hear
experiences or research people have done before picking specs for their
nodes including vendors/models.


Thanks,

Xuri


Hadoop Clients (Hive,Pig) and Hadoop Cluster

2013-08-29 Thread Raj Hadoop
Hi,
 
I am trying to setup a multi node hadoop cluster. I am trying to understand 
where hadoop clients like (Hive,Pig,Sqoop) would be installed in the Hadoop 
Cluster.
 
Say - I have three Linux machines- 
 
Node 1- Master - (Name Node , Job Tracker and Secondary Name Node)
Node 2    - Slave (Task Tracker,Data Node)
Node 3    - Slave (Task Tracker,Data Node)
 
On which machines should I install Hive? Should it be installed or Can it be 
installed on a separate machine? What user and privileges are required ?
On which machines should I install Pig? Should it be installed or Can it be 
installed on a separate machine? What user and privileges are required ?
On which machines should I install Sqoop? Should it be installed or Can it be 
installed on a separate machine? What user and privileges are required ?
 
Thanks,
Raj

Re: Hadoop Clients (Hive,Pig) and Hadoop Cluster

2013-08-29 Thread Xuri Nagarin
Yes, ideally you want to setup a 4th gateway node to run clients.
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Security-Guide/AppxG-Setting-Up-Gateway.html




On Thu, Aug 29, 2013 at 3:11 PM, Raj Hadoop hadoop...@yahoo.com wrote:

 Hi,

 I am trying to setup a multi node hadoop cluster. I am trying to
 understand where hadoop clients like (Hive,Pig,Sqoop) would be installed in
 the Hadoop Cluster.

 Say - I have three Linux machines-

 Node 1- Master - (Name Node , Job Tracker and Secondary Name Node)
 Node 2- Slave (Task Tracker,Data Node)
 Node 3- Slave (Task Tracker,Data Node)

 On which machines should I install Hive? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?
 On which machines should I install Pig? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?
 On which machines should I install Sqoop? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?

 Thanks,
 Raj



Re: Hadoop Clients (Hive,Pig) and Hadoop Cluster

2013-08-29 Thread Peyman Mohajerian
Regarding Sqoop, you can install it wherever you would have access to your
database and HDFS cluster, you could e.g. install it on the namenode if you
want it as long as it has access to the database that is the source or
target of your data transfer.



On Thu, Aug 29, 2013 at 3:11 PM, Raj Hadoop hadoop...@yahoo.com wrote:

 Hi,

 I am trying to setup a multi node hadoop cluster. I am trying to
 understand where hadoop clients like (Hive,Pig,Sqoop) would be installed in
 the Hadoop Cluster.

 Say - I have three Linux machines-

 Node 1- Master - (Name Node , Job Tracker and Secondary Name Node)
 Node 2- Slave (Task Tracker,Data Node)
 Node 3- Slave (Task Tracker,Data Node)

 On which machines should I install Hive? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?
 On which machines should I install Pig? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?
 On which machines should I install Sqoop? Should it be installed or Can it
 be installed on a separate machine? What user and privileges are required ?

 Thanks,
 Raj



Re: Hadoop Yarn

2013-08-29 Thread Vinod Kumar Vavilapalli

You'll have to change the MapReduce code. What options are you exactly looking 
for and why should they be only applied on some nodes? Some kind of sampling?

More details can help us help you.

Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/

On Aug 29, 2013, at 1:59 PM, Rajesh Jain wrote:

 I have some jvm options which i want to configure only for a few nodes in the 
 cluster using Hadoop yarn. How do i di it. If i edit the mapred-site.xml it 
 gets applied to all the task jvms. I just want handful of map jvms to have 
 that option and other map jvm not have that options. 
 
 Thanks
 Rajesh
 
 Sent from my iPhone


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: Hadoop Yarn

2013-08-29 Thread Rajesh Jain
Hi Vinod

These are jvm parameters to inject agent only on some nodes for sampling. 

Is there a property because code change is not a option. 

Second is there a way to tell the jvms how much data size to process. 

Thanks

Sent from my iPhone

On Aug 29, 2013, at 6:37 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote:

 
 You'll have to change the MapReduce code. What options are you exactly 
 looking for and why should they be only applied on some nodes? Some kind of 
 sampling?
 
 More details can help us help you.
 
 Thanks,
 +Vinod Kumar Vavilapalli
 Hortonworks Inc.
 http://hortonworks.com/
 
 On Aug 29, 2013, at 1:59 PM, Rajesh Jain wrote:
 
 I have some jvm options which i want to configure only for a few nodes in 
 the cluster using Hadoop yarn. How do i di it. If i edit the mapred-site.xml 
 it gets applied to all the task jvms. I just want handful of map jvms to 
 have that option and other map jvm not have that options. 
 
 Thanks
 Rajesh
 
 Sent from my iPhone
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 signature.asc


secondary sort - number of reducers

2013-08-29 Thread Adeel Qureshi
I have implemented secondary sort in my MR job and for some reason if i
dont specify the number of reducers it uses 1 which doesnt seems right
because im working with 800M+ records and one reducer slows things down
significantly. Is this some kind of limitation with the secondary sort that
it has to use a single reducer .. that kind of would defeat the purpose of
having a scalable solution such as secondary sort. I would appreciate any
help.

Thanks
Adeel


RE: copy files from hdfs to local fs

2013-08-29 Thread java8964 java8964
What's wrong by using old Unix pipe?
hadoop fs -cat /user/input/foo.txt | head -100  local_file

Date: Thu, 29 Aug 2013 13:50:37 -0700
Subject: Re: copy files from hdfs to local fs
From: chengi.liu...@gmail.com
To: user@hadoop.apache.org

tail will work as well.. ??? but i want to extract just (say) n lines out of 
this file?

On Thu, Aug 29, 2013 at 1:43 PM, Kim Chew kchew...@gmail.com wrote:

hadoop fs -copyToLocal
or
hadoop fs -get

It copies the whole file and won't be able just to copy part of the file, what 
is interesting is there is a tail command but no head.



Kim


On Thu, Aug 29, 2013 at 1:35 PM, Chengi Liu chengi.liu...@gmail.com wrote:


Ok,
  A very stupid question...
I have a large file in 


/user/input/foo.txt
I want to copy first 100 lines from this location to local filesystem...

And the data is very sensitive so i am bit hesistant to experiment.
What is the right way to copy sample data from hdfs to local fs.




  

Re: Issue with fs.delete

2013-08-29 Thread Abhijit Sarkar
Wow this is one helluva forum where people needing help leave the problem
to the expert's imagination. Even paid  support would close a ticket like
that without looking twice.


On Wed, Aug 28, 2013 at 4:40 AM, Harsh J ha...@cloudera.com wrote:

 Please also try to share your error/stacktraces when you post a question.

 All I can suspect is that your URI is malformed, and is missing the
 authority component. That is, it should be
 hdfs://host:port/path/to/file and not hdfs:/path/to/file.

 On Wed, Aug 28, 2013 at 1:44 PM, rab ra rab...@gmail.com wrote:
  -- Forwarded message --
  From: rab ra rab...@gmail.com
  Date: 28 Aug 2013 13:26
  Subject: Issue with fs.delete
  To: us...@hadoop.apache.org us...@hadoop.apache.org
 
  Hello,
 
  I am having a trouble in deleting a file from hdfs. I am using hadoop
 1.2.1
  stable release. I use the following code segment in my program
 
 
  fs.delete(new Path(hdfs:/user/username/input/input.txt))
  fs.copyFromLocalFile(false,false,new Path(input.txt),new
  Path(hdfs:/user/username/input/input.txt))
 
  Any hint?
 
 



 --
 Harsh J



Cache file conflict

2013-08-29 Thread Public Network Services
Hi...

After updating the source JARs of an application that launches a second job
while running a MR job, the following error keeps occurring:

org.apache.hadoop.mapred.InvalidJobConfException: cache file
(mapreduce.job.cache.files) scheme: hdfs, host: server, port: 9000,
file:
/tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0887/libjars/Some.jar,
conflicts with cache file (mapreduce.job.cache.files)
hdfs://server:9000/tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0888/libjars/Some.jar
at
org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:338)
at
org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:273)
at
org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:419)
at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:288)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)


where job_1367474197612_0887 is the name of the initial job,
job_1367474197612_0888 is the name of the subsequent job, and Some.jar is a
JAR file specific to the application.

Any ideas as to how the above error could be eliminated?

Thanks!


Re: Cache file conflict

2013-08-29 Thread Omkar Joshi
you should check this
https://issues.apache.org/jira/browse/MAPREDUCE-4493?focusedCommentId=13713706page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13713706

Thanks,
Omkar Joshi
*Hortonworks Inc.* http://www.hortonworks.com


On Thu, Aug 29, 2013 at 5:06 PM, Public Network Services 
publicnetworkservi...@gmail.com wrote:

 Hi...

 After updating the source JARs of an application that launches a second
 job while running a MR job, the following error keeps occurring:

 org.apache.hadoop.mapred.InvalidJobConfException: cache file
 (mapreduce.job.cache.files) scheme: hdfs, host: server, port: 9000,
 file:
 /tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0887/libjars/Some.jar,
 conflicts with cache file (mapreduce.job.cache.files)
 hdfs://server:9000/tmp/hadoop-yarn/staging/root/.staging/job_1367474197612_0888/libjars/Some.jar
 at
 org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:338)
 at
 org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:273)
 at
 org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:419)
 at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:288)
 at
 org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:394)
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1218)
 at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1215)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)
 at org.apache.hadoop.mapreduce.Job.submit(Job.java:1215)
 

 where job_1367474197612_0887 is the name of the initial job,
 job_1367474197612_0888 is the name of the subsequent job, and Some.jar is a
 JAR file specific to the application.

 Any ideas as to how the above error could be eliminated?

 Thanks!


-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.


Re: secondary sort - number of reducers

2013-08-29 Thread Adeel Qureshi
so it cant figure out an appropriate number of reducers as it does for
mappers .. in my case hadoop is using 2100+ mappers and then only 1 reducer
.. since im overriding the partitioner class shouldnt that decide how
manyredeucers there should be based on how many different partition values
being returned by the custom partiotioner


On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley i...@cloudera.com wrote:

 If you don't specify the number of Reducers, Hadoop will use the default
 -- which, unless you've changed it, is 1.

 Regards

 Ian.

 On Aug 29, 2013, at 4:23 PM, Adeel Qureshi adeelmahm...@gmail.com wrote:

 I have implemented secondary sort in my MR job and for some reason if i
 dont specify the number of reducers it uses 1 which doesnt seems right
 because im working with 800M+ records and one reducer slows things down
 significantly. Is this some kind of limitation with the secondary sort that
 it has to use a single reducer .. that kind of would defeat the purpose of
 having a scalable solution such as secondary sort. I would appreciate any
 help.

 Thanks
 Adeel



 ---
 Ian Wrigley
 Sr. Curriculum Manager
 Cloudera, Inc
 Cell: (323) 819 4075




Re: secondary sort - number of reducers

2013-08-29 Thread Adeel Qureshi
okay so when i specify the number of reducers e.g. in my example i m using
4 (for a much smaller data set) it works if I use a single column in my
composite key .. but if I add multiple columns in the composite key
separated by a delimi .. it then throws the illegal partition error (keys
before the pipe are group keys and after the pipe are the sort keys and my
partioner only uses the group keys

java.io.IOException: Illegal partition for *Atlanta:GA|Atlanta:GA:1:Adeel*(-1)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39)
at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


public int getPartition(Text key, HCatRecord record, int numParts) {
//extract the group key from composite key
String groupKey = key.toString().split(\\|)[0];
return groupKey.hashCode() % numParts;
}


On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma shekhar2...@gmail.comwrote:

 No...partitionr decides which keys should go to which reducer...and
 number of reducers you need to decide...No of reducers depends on
 factors like number of key value pair, use case etc
 Regards,
 Som Shekhar Sharma
 +91-8197243810


 On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi adeelmahm...@gmail.com
 wrote:
  so it cant figure out an appropriate number of reducers as it does for
  mappers .. in my case hadoop is using 2100+ mappers and then only 1
 reducer
  .. since im overriding the partitioner class shouldnt that decide how
  manyredeucers there should be based on how many different partition
 values
  being returned by the custom partiotioner
 
 
  On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley i...@cloudera.com wrote:
 
  If you don't specify the number of Reducers, Hadoop will use the default
  -- which, unless you've changed it, is 1.
 
  Regards
 
  Ian.
 
  On Aug 29, 2013, at 4:23 PM, Adeel Qureshi adeelmahm...@gmail.com
 wrote:
 
  I have implemented secondary sort in my MR job and for some reason if i
  dont specify the number of reducers it uses 1 which doesnt seems right
  because im working with 800M+ records and one reducer slows things down
  significantly. Is this some kind of limitation with the secondary sort
 that
  it has to use a single reducer .. that kind of would defeat the purpose
 of
  having a scalable solution such as secondary sort. I would appreciate
 any
  help.
 
  Thanks
  Adeel
 
 
 
  ---
  Ian Wrigley
  Sr. Curriculum Manager
  Cloudera, Inc
  Cell: (323) 819 4075
 
 



RE: secondary sort - number of reducers

2013-08-29 Thread java8964 java8964
The method getPartition() needs to return a positive number. Simply use 
hashCode() method is not enough.
See the Hadoop HashPartitioner implementation:








return (key.hashCode()  Integer.MAX_VALUE) % numReduceTasks;
When I first read this code, I always wonder why not use Math.abs? Is (  
Integer.MAX_VALUE) faster?
Yong
Date: Thu, 29 Aug 2013 20:55:46 -0400
Subject: Re: secondary sort - number of reducers
From: adeelmahm...@gmail.com
To: user@hadoop.apache.org

okay so when i specify the number of reducers e.g. in my example i m using 4 
(for a much smaller data set) it works if I use a single column in my composite 
key .. but if I add multiple columns in the composite key separated by a delimi 
.. it then throws the illegal partition error (keys before the pipe are group 
keys and after the pipe are the sort keys and my partioner only uses the group 
keys

java.io.IOException: Illegal partition for Atlanta:GA|Atlanta:GA:1:Adeel (-1)   
 at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) 
   at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:39)at 
com.att.hadoop.hivesort.HSMapper.map(HSMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)at 
org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)at 
org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)at 
java.security.AccessController.doPrivileged(Native Method)at 
javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1136)
at org.apache.hadoop.mapred.Child.main(Child.java:249)


public int getPartition(Text key, HCatRecord record, int numParts) {
//extract the group key from composite key
String groupKey = key.toString().split(\\|)[0];   
return groupKey.hashCode() % numParts;

}

On Thu, Aug 29, 2013 at 8:31 PM, Shekhar Sharma shekhar2...@gmail.com wrote:

No...partitionr decides which keys should go to which reducer...and


number of reducers you need to decide...No of reducers depends on

factors like number of key value pair, use case etc

Regards,

Som Shekhar Sharma

+91-8197243810





On Fri, Aug 30, 2013 at 5:54 AM, Adeel Qureshi adeelmahm...@gmail.com wrote:

 so it cant figure out an appropriate number of reducers as it does for

 mappers .. in my case hadoop is using 2100+ mappers and then only 1 reducer

 .. since im overriding the partitioner class shouldnt that decide how

 manyredeucers there should be based on how many different partition values

 being returned by the custom partiotioner





 On Thu, Aug 29, 2013 at 7:38 PM, Ian Wrigley i...@cloudera.com wrote:



 If you don't specify the number of Reducers, Hadoop will use the default

 -- which, unless you've changed it, is 1.



 Regards



 Ian.



 On Aug 29, 2013, at 4:23 PM, Adeel Qureshi adeelmahm...@gmail.com wrote:



 I have implemented secondary sort in my MR job and for some reason if i

 dont specify the number of reducers it uses 1 which doesnt seems right

 because im working with 800M+ records and one reducer slows things down

 significantly. Is this some kind of limitation with the secondary sort that

 it has to use a single reducer .. that kind of would defeat the purpose of

 having a scalable solution such as secondary sort. I would appreciate any

 help.



 Thanks

 Adeel







 ---

 Ian Wrigley

 Sr. Curriculum Manager

 Cloudera, Inc

 Cell: (323) 819 4075






  

Re: Hadoop Yarn

2013-08-29 Thread Hitesh Shah
Hi Rajesh,

Have you looked at re-using the profiling options to inject the jvm options to 
a defined range of tasks? 
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Profiling

-- Hitesh

On Aug 29, 2013, at 3:51 PM, Rajesh Jain wrote:

 Hi Vinod
 
 These are jvm parameters to inject agent only on some nodes for sampling. 
 
 Is there a property because code change is not a option. 
 
 Second is there a way to tell the jvms how much data size to process. 
 
 Thanks
 
 Sent from my iPhone
 
 On Aug 29, 2013, at 6:37 PM, Vinod Kumar Vavilapalli vino...@apache.org 
 wrote:
 
 
 You'll have to change the MapReduce code. What options are you exactly 
 looking for and why should they be only applied on some nodes? Some kind of 
 sampling?
 
 More details can help us help you.
 
 Thanks,
 +Vinod Kumar Vavilapalli
 Hortonworks Inc.
 http://hortonworks.com/
 
 On Aug 29, 2013, at 1:59 PM, Rajesh Jain wrote:
 
 I have some jvm options which i want to configure only for a few nodes in 
 the cluster using Hadoop yarn. How do i di it. If i edit the 
 mapred-site.xml it gets applied to all the task jvms. I just want handful 
 of map jvms to have that option and other map jvm not have that options. 
 
 Thanks
 Rajesh
 
 Sent from my iPhone
 
 
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to 
 which it is addressed and may contain information that is confidential, 
 privileged and exempt from disclosure under applicable law. If the reader of 
 this message is not the intended recipient, you are hereby notified that any 
 printing, copying, dissemination, distribution, disclosure or forwarding of 
 this communication is strictly prohibited. If you have received this 
 communication in error, please contact the sender immediately and delete it 
 from your system. Thank You.
 signature.asc



Re: Multidata center support

2013-08-29 Thread Rahul Bhattacharjee
My take on this.

Why hadoop has to know about data center thing. I think it can be installed
across multiple data centers , however topology configuration would be
required to tell which node belongs to which data center and switch for
block placement.

Thanks,
Rahul


On Fri, Aug 30, 2013 at 12:42 AM, Baskar Duraikannu 
baskar.duraika...@outlook.com wrote:

 We have a need to setup hadoop across data centers.  Does hadoop support
 multi data center configuration? I searched through archives and have found
 that hadoop did not support multi data center configuration some time back.
 Just wanted to see whether situation has changed.

 Please help.



RE: Hadoop Yarn - samples

2013-08-29 Thread Devaraj k
Perhaps you can try writing the same yarn application using these steps.

http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html

Thanks
Devaraj k

From: Punnoose, Roshan [mailto:rashan.punnr...@merck.com]
Sent: 29 August 2013 19:43
To: user@hadoop.apache.org
Subject: Re: Hadoop Yarn - samples

Is there an example of running a sample yarn application that will only allow 
one container to start per host?

Punnoose, Roshan
rashan.punnr...@merck.commailto:rashan.punnr...@merck.com



On Aug 29, 2013, at 10:08 AM, Arun C Murthy 
a...@hortonworks.commailto:a...@hortonworks.com wrote:


Take a look at the dist-shell example in 
http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/

I recently wrote up another simplified version of it for illustration purposes 
here: https://github.com/hortonworks/simple-yarn-app

Arun

On Aug 28, 2013, at 4:47 AM, Manickam P 
manicka...@outlook.commailto:manicka...@outlook.com wrote:


Hi,

I have just installed Hadoop 2.0.5 alpha version.
I want to analyse how the Yarn resource manager and node mangers works.
I executed the map reduce examples but i want to execute the samples in Yarn. 
Searching for that but unable to find any.  Please help me.



Thanks,
Manickam P

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/


CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.


Notice:  This e-mail message, together with any attachments, contains
information of Merck  Co., Inc. (One Merck Drive, Whitehouse Station,
New Jersey, USA 08889), and/or its affiliates Direct contact information
for affiliates is available at
http://www.merck.com/contact/contacts.html) that may be confidential,
proprietary copyrighted and/or legally privileged. It is intended solely
for the use of the individual or entity named on this message. If you are
not the intended recipient, and have received this message in error,
please notify us immediately by reply e-mail and then delete it from
your system.