Re: How to handle sensitive data
Regards, abhishek. I´m agree with Michael. You can encrypt your incoming data from your application. I recommend to use HBase too. - Mensaje original - De: Michael Segel michael_se...@hotmail.com Para: common-user@hadoop.apache.org CC: cdh-u...@cloudera.org Enviados: Viernes, 15 de Febrero 2013 8:47:16 Asunto: Re: How to handle sensitive data Simple, have your app encrypt the field prior to writing to HDFS. Also consider HBase. On Feb 14, 2013, at 10:35 AM, abhishek abhishek.dod...@gmail.com wrote: Hi all, we are having some sensitive data, in some particular fields(columns). Can I know how to handle sensitive in Hadoop. How do different people handle sensitive data in Hadoop. Thanks Abhi Michael Segel | (m) 312.755.9623 Segel and Associates -- Marcos Ortiz Valmaseda, Product Manager Data Scientist at UCI Blog : http://marcosluis2186.posterous.com LinkedIn: http://www.linkedin.com/in/marcosluis2186 Twitter : @marcosluis2186
Re: Which hardware to choose
Which is a reasonable number in this hardware? On 10/02/2012 09:40 PM, Michael Segel wrote: I think he's saying that its 24 maps 8 reducers per node and at 48GB that could be too many mappers. Especially if they want to run HBase. On Oct 2, 2012, at 8:14 PM, hadoopman hadoop...@gmail.com wrote: Only 24 map and 8 reduce tasks for 38 data nodes? are you sure that's right? Sounds VERY low for a cluster that size. We have only 10 c2100's and are running I believe 140 map and 70 reduce slots so far with pretty decent performance. On 10/02/2012 12:55 PM, Alexander Pivovarov wrote: 38 data nodes + 2 Name Nodes Data Node: Dell PowerEdge C2100 series 2 x XEON x5670 48 GB RAM ECC (12x4GB 1333MHz) 12 x 2 TB 7200 RPM SATA HDD (with hot swap) JBOD Intel Gigabit ET Dual port PCIe x4 Redundant Power Supply Hadoop CDH3 max map tasks 24 max reduce tasks 8 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda *Data Engineer Sr. System Administrator at UCI* about.me/marcosortiz http://about.me/marcosortiz My Blog http://marcosluis2186.posterous.com Tumblr's blog http://marcosortiz.tumblr.com/ @marcosluis2186 http://twitter.com/marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Hadoop 1.0.3 setup
On 07/09/2012 09:58 AM, prabhu K wrote: Yes, i have configuared multinode setup, 1 master 2 slaves, i have formated the namenode and then i run the stat-dfs.sh script and start-mapred.sh script. I run the bin/hadoop fs -put input input command , getting following error on my terminal. hduser@md-trngpoc1:/usr/local/hadoop_dir/hadoop$ bin/hadoop fs -put input input Warning: $HADOOP_HOME is deprecated. put: org.apache.hadoop.security.AccessControlException: Permission denied: user=hduser, access=WRITE, inode=:root:supergroup:rwxr-xr-x and executed the below command, getting /hadoop-install/hadoop directroy, i coud't understand what's wrong iam doing? Well, this erros says to you that you have the wrong permissions in the hadoop directory, the user and group that you have is root:supergroup and the correct values for it is: hduser:supergroup hduser@md-trngpoc1:/usr/local/hadoop_dir/hadoop$ echo $HADOOP_HOME /hadoop-install/hadoop *Namenode log:* == java.lang.InterruptedException: sleep interrupted at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65) at java.lang.Thread.run(Thread.java:662) 2012-07-09 19:02:12,696 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException: Problem binding to md-trngpoc1/10.5.114.110:54310 : Address alrea dy in use It seems that you are using that address:port values. Use this commands: netstat -puta | grep namenode netstat -puta | grep datanode to check which are the ports that the NN and DN are using. at org.apache.hadoop.ipc.Server.bind(Server.java:227) at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301) at org.apache.hadoop.ipc.Server.init(Server.java:1483) at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545) at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:294) at org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:496) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.apache.hadoop.ipc.Server.bind(Server.java:225) ... 8 more *Datanode log* = 2012-07-09 18:44:39,949 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = md-trngpoc3/10.5.114.168 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012 / 2012-07-09 18:44:40,039 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2012-07-09 18:44:40,047 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered. 2012-07-09 18:44:40,048 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 2012-07-09 18:44:40,048 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started 2012-07-09 18:44:40,125 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered. 2012-07-09 18:44:40,163 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: can not create directory: /app/hadoop_dir/hadoop/tmp/df s/data 2012-07-09 18:44:40,163 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid. 2012-07-09 18:44:40,163 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode 2012-07-09 18:44:40,164 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: / SHUTDOWN_MSG: Shutting down DataNode at md-trngpoc3/10.5.114.168 / 2012-07-09 18:46:09,586 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG: / STARTUP_MSG: Starting DataNode STARTUP_MSG: host = md-trngpoc3/10.5.114.168 STARTUP_MSG: args = [] STARTUP_MSG: version = 1.0.3 STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1335192; compiled by 'hortonfo' on Tue May 8 20:31:25 UTC 2012
Re: Versions
On 07/07/2012 02:39 PM, Harsh J wrote: The Apache Bigtop project was started for this very purpose (building stable, well inter-operating version stacks). Take a read at http://incubator.apache.org/bigtop/ and for 1.x Bigtop packages, see https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop To specifically answer your question though, your list appears fine to me. They 'should work', but I am not suggesting that I have tested this stack completely myself. On Sat, Jul 7, 2012 at 11:57 PM, prabhu K prabhu.had...@gmail.com wrote: Hi users list, I am planing to install following tools. Hadoop 1.0.3 hive 0.9.0 flume 1.2.0 Hbase 0.92.1 sqoop 1.4.1 My only suggestion here is that you use the 0.94 version of HBase, it has a lot of improvements over 0.92.1 See the Cloudera's blog post for it: http://www.cloudera.com/blog/2012/05/apache-hbase-0-94-is-now-released/ Best wishes my questions are. 1. the above tools are compatible with all the versions. 2. any tool need to change the version 3. list out all the tools with compatible versions. Please suggest on this? -- Marcos Luis Ortíz Valmaseda *Data Engineer Sr. System Administrator at UCI* 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: set up Hadoop cluster on mixed OS
I have a mixed cluster too, with Linux (CentOS) and Solaris, the unique recommendation that I can give you is to use exactly the same Hadoop version in all machines. Best wishes On 07/06/2012 05:31 AM, Senthil Kumar wrote: You can setup hadoop cluster on mixed environment. We have a cluster with Mac, Linux and Solaris. Regards Senthil On Fri, Jul 6, 2012 at 1:50 PM, Yongwei Xing jdxyw2...@gmail.com wrote: I have one MBP with 10.7.4 and one laptop with Ubuntu 12.04. Is it possible to set up a hadoop cluster by such mixed environment? Best Regards, -- Welcome to my ET Blog http://www.jdxyw.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda *Data Engineer Sr. System Administrator at UCI* about.me/marcosortiz http://about.me/marcosortiz My Blog http://marcosluis2186.posterous.com @marcosluis2186 http://twitter.com/marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode
According to the CDH 4 official documentation, you should install a JobHistory server for your MRv2 (YARN) cluster. https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster How to configure the HistoryServer https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3 On 06/13/2012 03:16 PM, anil gupta wrote: Hi All I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5 nodes in my cluster(2 Admin Node and 3 DN). My resourcemanager is up and running and showing that all three DN are running the nodemanager. HDFS is also working fine and showing 3 DN's. But when i fire the pi example job. It starts to run in Local mode. Here is the console output: sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce- examples.jar pi 10 10 Number of Maps = 10 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to process : 10 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 12/06/13 12:03:29 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code 0 12/06/13 12:03:29 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680 12/06/13 12:03:30 INFO mapred.JobClient: map 0% reduce 0% 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000 samples. 12/06/13 12:03:36 INFO mapred.JobClient: map 100% reduce 0% 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000 samples. Here is the content of yarn-site.xml: configuration property nameyarn.nodemanager.aux-services/name valuemapreduce.shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionList of directories to store localized files in./ description nameyarn.nodemanager.local-dirs/name value/disk/yarn/local/value /property property descriptionWhere to store container logs./description nameyarn.nodemanager.log-dirs/name value/disk/yarn/logs/value /property property descriptionWhere to aggregate logs to./description nameyarn.nodemanager.remote-app-log-dir/name value/var/log/hadoop-yarn/apps/value /property property descriptionClasspath for typical applications./description nameyarn.application.classpath/name value $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $YARN_HOME/*,$YARN_HOME/lib/* /value /property property nameyarn.resourcemanager.resource-tracker.address/name valueihub-an-g1:8025/value /property property nameyarn.resourcemanager.address/name valueihub-an-g1:8040/value /property property nameyarn.resourcemanager.scheduler.address/name valueihub-an-g1:8030/value /property property nameyarn.resourcemanager.admin.address/name valueihub-an-g1:8141/value /property property nameyarn.resourcemanager.webapp.address/name valueihub-an-g1:8088/value /property property namemapreduce.jobhistory.intermediate-done-dir/name value/disk/mapred/jobhistory/intermediate/done/value /property property
Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode
Can you share with us in pastebin all conf files that you are using for YARN? On 06/13/2012 05:26 PM, anil gupta wrote: Hi Marcus, Sorry i forgot to mention that Job history server is installed and running and AFAIK resourcemanager is responsible for running MR jobs. Historyserver is only used to get info about MR jobs. Thanks, Anil On Wed, Jun 13, 2012 at 2:04 PM, Marcos Ortiz mlor...@uci.cu mailto:mlor...@uci.cu wrote: According to the CDH 4 official documentation, you should install a JobHistory server for your MRv2 (YARN) cluster. https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster How to configure the HistoryServer https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3 On 06/13/2012 03:16 PM, anil gupta wrote: Hi All I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5 nodes in my cluster(2 Admin Node and 3 DN). My resourcemanager is up and running and showing that all three DN are running the nodemanager. HDFS is also working fine and showing 3 DN's. But when i fire the pi example job. It starts to run in Local mode. Here is the console output: sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce- examples.jar pi 10 10 Number of Maps = 10 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Starting Job 12/06/13 12:03:27 WARN conf.Configuration: session.id http://session.id is deprecated. Instead, use dfs.metrics.session-id 12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop library 12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to process : 10 12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in config null 12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter 12/06/13 12:03:29 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code 0 12/06/13 12:03:29 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381 12/06/13 12:03:29 WARN mapreduce.Counters: Counter name MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group name and BYTES_READ as counter name instead 12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1 12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100 12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720 12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680 12/06/13 12:03:30 INFO mapred.JobClient: map 0% reduce 0% 12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000 samples. 12/06/13 12:03:36 INFO mapred.JobClient: map 100% reduce 0% 12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000 samples. Here is the content of yarn-site.xml: configuration property nameyarn.nodemanager.aux-services/name valuemapreduce.shuffle/value /property property nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name valueorg.apache.hadoop.mapred.ShuffleHandler/value /property property nameyarn.log-aggregation-enable/name valuetrue/value /property property descriptionList of directories to store localized files in./ description nameyarn.nodemanager.local-dirs/name value/disk/yarn/local/value /property property descriptionWhere to store container logs./description nameyarn.nodemanager.log-dirs/name value/disk/yarn/logs/value /property property descriptionWhere to aggregate logs to./description
Re: No space left on device
Do you have the JT and NN on the same node? Look here on the Lars Francke´s post: http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html This is a very schema how to install Hadoop, and look the configuration that he used for the name and data directories. If this directories are in the same disk, and you don´t have enough space for it, you can find that exception. My recomendation is to divide these directories in separate discs with a very similar schema to the Lars´s configuration Another recomendation is to check the Hadoop´s logs. Read about this here: http://www.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/ regards On 05/28/2012 02:20 AM, yingnan.ma wrote: ok,I find it. the jobtracker server is full. 2012-05-28 yingnan.ma 发件人: yingnan.ma 发送时间: 2012-05-28 13:01:56 收件人: common-user 抄送: 主题: No space left on device Hi, I encounter a problem as following: Error - Job initialization failed: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:201) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:348) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86) at org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1344) .. So, I think that the HDFS is full or something, but I cannot find a way to address the problem, if you had some suggestion, Please show me , thank you. Best Regards -- Marcos Luis Ortíz Valmaseda Data Engineer Sr. System Administrator at UCI http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: EOFException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)......
Regards, waqas. I think that you have to ask to MapR experts. On 05/25/2012 05:42 AM, waqas latif wrote: Hi Experts, I am fairly new to hadoop MapR and I was trying to run a matrix multiplication example presented by Mr. Norstadt under following link http://www.norstad.org/matrix-multiply/index.html. I can run it successfully with hadoop 0.20.2 but I tried to run it with hadoop 1.0.3 but I am getting following error. Is it the problem with my hadoop configuration or it is compatibility problem in the code which was written in hadoop 0.20 by author.Also please guide me that how can I fix this error in either case. Here is the error I am getting. The same code that you write for 0.20.2 should work in 1.0.3 too. in thread main java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at TestMatrixMultiply.fillMatrix(TestMatrixMultiply.java:60) at TestMatrixMultiply.readMatrix(TestMatrixMultiply.java:87) at TestMatrixMultiply.checkAnswer(TestMatrixMultiply.java:112) at TestMatrixMultiply.runOneTest(TestMatrixMultiply.java:150) at TestMatrixMultiply.testRandom(TestMatrixMultiply.java:278) at TestMatrixMultiply.main(TestMatrixMultiply.java:308) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Thanks in advance Regards, waqas Can you put here the completed log for this? Best wishes -- Marcos Luis Ortíz Valmaseda Data Engineer Sr. System Administrator at UCI http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: While Running in cloudera version of hadoop getting error
Why don´t use the same Hadoop version in both clusters? It will brings to you minor troubles. On 05/24/2012 02:26 PM, samir das mohapatra wrote: Hi I created application jar and i was trying to run in 2 node cluster using cludera .20 version , it was running fine, But when i am running that same jar in Deployment server (Cloudera version .20.x ) having 40 node cluster I am getting error cloude any one please help me with this. 12/05/24 09:39:09 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. Like this says here, you should implement Tool for your MapReduce Job 12/05/24 09:39:10 INFO mapred.FileInputFormat: Total input paths to process : 1 12/05/24 09:39:10 INFO mapred.JobClient: Running job: job_201203231049_12426 12/05/24 09:39:11 INFO mapred.JobClient: map 0% reduce 0% 12/05/24 09:39:20 INFO mapred.JobClient: Task Id : attempt_201203231049_12426_m_00_0, Status : FAILED java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325) at org.apache.hadoop.mapred.Child$4.run(Child.java:270) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) at org.apache.hadoop.mapred.Child.main(Child.java:264) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav attempt_201203231049_12426_m_00_0: getDefaultExtension() 12/05/24 09:39:20 INFO mapred.JobClient: Task Id : attempt_201203231049_12426_m_01_0, Status : FAILED Thanks samir -- Marcos Luis Ortíz Valmaseda Data Engineer Sr. System Administrator at UCI http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Is it okay to upgrade from CDH3U2 to hadoop 1.0.2 and hbase 0.92.1?
I think that you should follow the CDH4 Beta 2 docs, specifically the know issues for this version: https://ccp.cloudera.com/display/CDH4B2/Known+Issues+and+Work+Arounds+in+CDH4 Then, you should see the HBase installation and upgrading on this version: https://ccp.cloudera.com/display/CDH4B2/HBase+Installation#HBaseInstallation-InstallingHBase Another thing that you keep in mind is that with HBase 0.92.1, you should restart your cluster because the wire protocol changed from 0.90 to 0.92, so, the rolling restarts do not work here. Best wishes On 05/21/2012 10:44 PM, edward choi wrote: Hi, I have used CDH3U2 for almost a year now. Since it is a quite old distribution, there are certain glitches that keep bothering me. So I was considering upgrading to Hadoop 1.0.3 and Hbase 0.92.1. My concern is that, if it is okay to just install the new packages and set the configurations the same as before? Or do I need to download all the files on HDFS to local hard drive and upload them again once the new packages are installed? (that would be a horrible job to do though) Any advice will be helpful. Thanks. Ed 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda Data Engineer Sr. System Administrator at UCI http://marcosluis2186.posterous.com http://www.linkedin.com/in/marcosluis2186 Twitter: @marcosluis2186 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: hadoop on fedora 15
On 04/26/2012 01:49 AM, john cohen wrote: I had the same issue. My problem was the use of VPN connected to work, and at the same time working with M/R jobs on my Mac. It occurred to me that maybe Hadoop was binding to the wrong IP (the IP given to you after connecting through VPN), bottom line, I disconnect from the VPN, and the M/R job finished as expected after that. This is logic because, after that you configure to connect to the VPN, your machines have other IPs, based on the request of the private network. You can test, changing the IPs for the new ones based on the VPN request. -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Yahoo Hadoop Tutorial with new APIs?
Regards to all the list. There are many people that use the Hadoop Tutorial released by Yahoo at http://developer.yahoo.com/hadoop/tutorial/ http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining The main issue here is that, this tutorial is written with the old APIs? (Hadoop 0.18 I think). Is there a project for update this tutorial to the new APIs? to Hadoop 1.0.2 or YARN (Hadoop 0.23) Best wishes -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Yahoo Hadoop Tutorial with new APIs?
On 04/04/2012 09:15 AM, Jagat Singh wrote: Hello Marcos Yes , Yahoo tutorials are pretty old but still they explain the concepts of Map Reduce , HDFS beautifully. The way in which tutorials have been defined into sub sections , each builing on previous one is awesome. I remember when i started i was digged in there for many days. The tutorials are lagging now from new API point of view. Yes, for that reason, for its beauty, this tutorial is read by many new Hadoop comers, so, I think that it need an update. Lets have some documentation session one day , I would love to Volunteer to update those tutorials if people at Yahoo take input from outside world :) I want to help on this too, so, we need to talk with Hadoop colleagues to do this. Regards and best wishes Regards, Jagat - Original Message - From: Marcos Ortiz Sent: 04/04/12 08:32 AM To: common-user@hadoop.apache.org, 'hdfs-u...@hadoop.apache.org', mapreduce-u...@hadoop.apache.org Subject: Yahoo Hadoop Tutorial with new APIs? Regards to all the list. There are many people that use the Hadoop Tutorial released by Yahoo at http://developer.yahoo.com/hadoop/tutorial/ http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining The main issue here is that, this tutorial is written with the old APIs? (Hadoop 0.18 I think). Is there a project for update this tutorial to the new APIs? to Hadoop 1.0.2 or YARN (Hadoop 0.23) Best wishes -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com http://www.uci.cu/ -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: opensuse 12.1
Like OpenSUSE is a RPM-based distribution, you can try with the Apache BigTop project [1], and look for the RPM packages and give them a try. You have noticed that the RPM specification between OpenSUSE and Red Hat-based distributions () change a little, but it can be a starting point. See the documentation for the project [2]. [1] http://incubator.apache.org/projects/bigtop.html [2] https://cwiki.apache.org/confluence/display/BIGTOP/Index%3bjsessionid=AA31645DFDAE1F3282D0159DB9B6AE9A Regards On 04/04/2012 12:24 PM, Raj Vishwanathan wrote: Lots of people seem to start with this. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Raj From: Barry, Sean Fsean.f.ba...@intel.com To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org Sent: Wednesday, April 4, 2012 9:12 AM Subject: FW: opensuse 12.1 -Original Message- From: Barry, Sean F [mailto:sean.f.ba...@intel.com] Sent: Wednesday, April 04, 2012 9:10 AM To: common-user@hadoop.apache.org Subject: opensuse 12.1 What is the best way to install hadoop on opensuse 12.1 for a small two node cluster. -SB 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Yahoo Hadoop Tutorial with new APIs?
Ok, Robert, I will be waiting for you then. There are many folks that use this tutorial, so I think this a good effort in favor of the Hadoop community.It would be nice if Yahoo! donate this work, because, I have some ideas behind this, for example: to release a Spanish version of the tutorial. Regards and best wishes On 04/04/2012 05:29 PM, Robert Evans wrote: I am dropping the cross posts and leaving this on common-user with the others BCCed. Marcos, That is a great idea to be able to update the tutorial, especially if the community is interested in helping to do so. We are looking into the best way to do this. The idea right now is to donate this to the Hadoop project so that the community can keep it up to date, but we need some time to jump through all of the corporate hoops to get this to happen. We have a lot going on right now, so if you don't see any progress on this please feel free to ping me and bug me about it. -- Bobby Evans On 4/4/12 8:15 AM, Jagat Singh jagatsi...@gmail.com wrote: Hello Marcos Yes , Yahoo tutorials are pretty old but still they explain the concepts of Map Reduce , HDFS beautifully. The way in which tutorials have been defined into sub sections , each builing on previous one is awesome. I remember when i started i was digged in there for many days. The tutorials are lagging now from new API point of view. Lets have some documentation session one day , I would love to Volunteer to update those tutorials if people at Yahoo take input from outside world :) Regards, Jagat - Original Message - From: Marcos Ortiz Sent: 04/04/12 08:32 AM To: common-user@hadoop.apache.org, 'hdfs-u...@hadoop.apache.org %27hdfs-u...@hadoop.apache.org', mapreduce-u...@hadoop.apache.org Subject: Yahoo Hadoop Tutorial with new APIs? Regards to all the list. There are many people that use the Hadoop Tutorial released by Yahoo at http://developer.yahoo.com/hadoop/tutorial/ http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining The main issue here is that, this tutorial is written with the old APIs? (Hadoop 0.18 I think). Is there a project for update this tutorial to the new APIs? to Hadoop 1.0.2 or YARN (Hadoop 0.23) Best wishes -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com http://www.uci.cu/ http://www.uci.cu/ -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Job tracker service start issue.
On 03/23/2012 06:57 AM, kasi subrahmanyam wrote: Hi Oliver, I am not sure my suggestion might solve your problem or it might be already solved on your side. It seems the task tracker is having a problem accessing the tmp directory. Try going to the core and mapred site xml and change the tmp directory to a new one. If this is not yet working then manually change the permissions of theat directory using : chmod -R 777 tmp Please, don´t do chmod -R 777 in tmp directory. It´s not recommendable for production servers. The first option is more wise: 1- change the tmp directory in the core and mapreduce files 2- chown this new directory to group hadoop, where are the mapred and hdfs users On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallouolivier.sal...@irisa.frwrote: Le 3/23/12 8:50 AM, Manish Bhoge a écrit : I have Hadoop running on Standalone box. When I am starting deamon for namenode, secondarynamenode, job tracker, task tracker and data node, it is starting gracefully. But soon after it start job tracker it doesn't show up job tracker service. when i run 'jps' it is showing me all the services including task tracker except Job Tracker. Is there any time limit that need to set up or is it going into the safe mode. Because when i saw job tracker log this what it is showing, looks like it is starting the namenode but soon after it shutdown: 2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker: STARTUP_MSG: / STARTUP_MSG: Starting JobTracker STARTUP_MSG: host = manish/10.131.18.119 STARTUP_MSG: args = [] STARTUP_MSG: version = 0.20.2-cdh3u3 STARTUP_MSG: build = file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick -r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb 16 10:22:53 PST 2012 / 2012-03-22 23:26:04,140 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Starting expired delegation token remover thread, tokenRemoverScanInterval=60 min(s) 2012-03-22 23:26:04,141 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens 2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker: Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT, limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1) 2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list 2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker: Starting jobtracker with owner as mapred 2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 54311 2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:04,206 INFO org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics with hostName=JobTracker, port=54311 2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter) 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: listener.getLocalPort() returned 50030 webServer.getConnectors()[0].getLocalPort() returned 50030 2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty bound to port 50030 2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1 2012-03-22 23:26:09,517 INFO org.mortbay.log: Started SelectChannelConnector@0.0.0.0:50030 2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker up at: 54311 2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker: JobTracker webserver: 50030 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed to operate on mapred.system.dir (hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of permissions. 2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This directory should be owned by the user 'mapred (auth:SIMPLE)' 2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker: Bailing out ... org.apache.hadoop.security.AccessControlException: The systemdir
Apache Hadoop works with IPv6?
Regards. I'm very interested to know if Apache Hadoop works with IPv6 hosts. One of my clients has some hosts with this feature and they want to know if Hadoop supports this. Anyone has tested this? Best wishes -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Reduce copy speed too slow
Hi, Gayatri On 03/20/2012 11:59 AM, Gayatri Rao wrote: Hi all, I am running a map reduce job in EC2 instances and it seems to be very slow. It takes hours together for simple projection and aggregation of data. What filesystem are you using for data storage: HDFS in EC2 or Amazon S3? Which is the data size that you are analyzing? Upon observation, I gathered that the reduce copy speed is 0.01 MB/sec. I am new to hadoop. Could any one please share insights about the reduce copy speeds are good to work with. If any one has an experience any tips in improving it. Hadoop Map/Reduce jobs shuffle lots of data, so the recommended configuration is to use 10Gbps networks for the underline connection (and dedicated switches on dual-gigabit networks) Remember too that Hadoop is not a real-time system, if you need real-time random access to your data, use HBase http://hbase.apache.org Regards Thanks Gayatri 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at UCI http://marcosluis2186.posterous.com 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?
On 03/15/2012 09:22 AM, Manu S wrote: Thanks a lot Bijoy, that makes sense :) Suppose if I have Mysql database in some other node(not in hadoop cluster), can I import the tables using sqoop to my HDFS? Yes, this is the main purpose of Sqoop On the Cloudera site, you have the completed documentation for it Sqoop User Guide http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html Sqoop installation https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation Sqoop for MySQL http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql Sqoop site on GitHub http://github.com/cloudera/sqoop Cloudera blog related post to Sqoop http://www.cloudera.com/blog/category/sqoop/ Best wishes On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks bejoy.had...@gmail.com mailto:bejoy.had...@gmail.com wrote: Hi Manu Please find my responses inline I had read about we can install Pig, hive Sqoop on the client node, no need to install it in cluster. What is the client node actually? Can I use my management-node as a client? On larger clusters we have different node that is out of hadoop cluster and these stay in there. So user programs would be triggered from this node. This is the node refereed to as client node/ edge node etc . For your cluster management node and client node can be the same What is the best practice to install Pig, Hive, Sqoop? On a client node For the fully distributed cluster do we need to install Pig, Hive, Sqoop in each nodes? No, can be on a client node or on any of the nodes Mysql is needed for Hive as a metastore and sqoop can import mysql database to HDFS or hive or pig, so can we make use of mysql DB's residing on another node? Regarding your first point, SQOOP import is for different purpose, to get data from RDBNS into hdfs. But the meta stores is used by hive in framing the map reduce jobs corresponding to your hive query. Here SQOOP can't help you much Recommend to have the metastore db of hive on the same node where hive is installed as for execution hive queries there is meta data look up required much especially when your table has large number of partitions and all. Regards Bejoy.K.S On Thu, Mar 15, 2012 at 5:34 PM, Manu S manupk...@gmail.com mailto:manupk...@gmail.com wrote: Greetings All !!! I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes, in which 5 are used for a fully distributed cluster, 1 for pseudo-distributed 1 as management-node. Fully distributed cluster: HDFS, Mapreduce Hbase cluster Pseudo distributed mode: All I had read about we can install Pig, hive Sqoop on the client node, no need to install it in cluster. What is the client node actually? Can I use my management-node as a client? What is the best practice to install Pig, Hive, Sqoop? For the fully distributed cluster do we need to install Pig, Hive, Sqoop in each nodes? Mysql is needed for Hive as a metastore and sqoop can import mysql database to HDFS or hive or pig, so can we make use of mysql DB's residing on another node? -- Thanks Regards Manu S SI Engineer - OpenSource HPC Wipro Infotech Mob: +91 8861302855Skype: manuspkd www.opensourcetalk.co.in http://www.opensourcetalk.co.in -- Thanks Regards Manu S SI Engineer - OpenSource HPC Wipro Infotech Mob: +91 8861302855Skype: manuspkd www.opensourcetalk.co.in http://www.opensourcetalk.co.in -- Marcos Luis Ortíz Valmaseda Sr. Software Engineer (UCI) http://marcosluis2186.posterous.com http://postgresql.uci.cu/blog/38 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: hadoop branch-0.20-append Build error:build.xml:933: exec returned: 1
El 4/11/2011 10:45 PM, Alex Luya escribió: BUILD FAILED .../branch-0 .20-append/build.xml:927: The following error occurred while executing this line: ../branch-0 .20-append/build.xml:933: exec returned: 1 Total time: 1 minute 17 seconds + RESULT=1 + '[' 1 '!=' 0 ']' + echo 'Build Failed: 64-bit build not run' Build Failed: 64-bit build not run + exit 1 - I checked content in file build.xml: line 927:antcall target=cn-docs//targettarget name=cn-docs depends=forrest.check, init description=Generate forrest-based Chinese documentation. To use, specify -Dforrest.home=lt;base of Apache Forrest installationgt; on the command line. if=forrest.home line 933:exec dir=${src.docs.cn} executable=${forrest.home}/bin/forrest failonerror=true --- It seems try to execute forrest,what is the problem here?I am running a 64bit ubuntu,with 64+32bit-jdk-1.6 and 64-bit-jdk-1.5 installed.Some guys told there are some tricks in this page:http://wiki.apache.org/hadoop/HowToRelease to get forrest build to work.But I can't find any tricks in the page. Any help is appreciated. 1- Which version of Java do you have on the JAVA_HOME variable? You can browse on the Forrest page to get how you can build it: http://forrest.apache.org 2- another question for you: Do you actually need Forrest? Regards -- Marcos Luís Ortíz Valmaseda Software Engineer (Large-Scaled Distributed Systems) University of Information Sciences, La Habana, Cuba Linux User # 418229