Versions
Hi users list, I am planing to install following tools. Hadoop 1.0.3 hive 0.9.0 flume 1.2.0 Hbase 0.92.1 sqoop 1.4.1 my questions are. 1. the above tools are compatible with all the versions. 2. any tool need to change the version 3. list out all the tools with compatible versions. Please suggest on this?
Re: Versions
The Apache Bigtop project was started for this very purpose (building stable, well inter-operating version stacks). Take a read at http://incubator.apache.org/bigtop/ and for 1.x Bigtop packages, see https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop To specifically answer your question though, your list appears fine to me. They 'should work', but I am not suggesting that I have tested this stack completely myself. On Sat, Jul 7, 2012 at 11:57 PM, prabhu K prabhu.had...@gmail.com wrote: Hi users list, I am planing to install following tools. Hadoop 1.0.3 hive 0.9.0 flume 1.2.0 Hbase 0.92.1 sqoop 1.4.1 my questions are. 1. the above tools are compatible with all the versions. 2. any tool need to change the version 3. list out all the tools with compatible versions. Please suggest on this? -- Harsh J
Re: Versions
On 07/07/2012 02:39 PM, Harsh J wrote: The Apache Bigtop project was started for this very purpose (building stable, well inter-operating version stacks). Take a read at http://incubator.apache.org/bigtop/ and for 1.x Bigtop packages, see https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop To specifically answer your question though, your list appears fine to me. They 'should work', but I am not suggesting that I have tested this stack completely myself. On Sat, Jul 7, 2012 at 11:57 PM, prabhu K prabhu.had...@gmail.com wrote: Hi users list, I am planing to install following tools. Hadoop 1.0.3 hive 0.9.0 flume 1.2.0 Hbase 0.92.1 sqoop 1.4.1 My only suggestion here is that you use the 0.94 version of HBase, it has a lot of improvements over 0.92.1 See the Cloudera's blog post for it: http://www.cloudera.com/blog/2012/05/apache-hbase-0-94-is-now-released/ Best wishes my questions are. 1. the above tools are compatible with all the versions. 2. any tool need to change the version 3. list out all the tools with compatible versions. Please suggest on this? -- Marcos Luis Ortíz Valmaseda *Data Engineer Sr. System Administrator at UCI* 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: Versions of Jetty Log4j in CDHu3
Oh ok. Thanks Chaidy. I was wondering if I can just use log4j compression facility along with timebased rolling as in terms of gzip and then use less disk space. This seems to be feature available in 1.3 (not sure if it is also available in log4j-1.2.15), I think I need to give it a try and see. But on the other hand, if it does not work, any process suggestion that I need to follow to upgrade log4j 1.3 and expect hadoop to be compatible with new log4j library (version changes?) Thanks, Nikhil On Sun, Apr 29, 2012 at 11:07 AM, CHAIDY cha...@nsccsz.gov.cn wrote: Hi, Nikhil! FYI: jetty-6.1.26, log4j-1.2.15 At 2012-04-29 13:03:44,Nikhil mnik...@gmail.com wrote: Hi, I was wondering about the release versions of both Jetty and log4j components released as part of CDHu3 release package. Can someone please let me know. Thanks.
Versions of Jetty Log4j in CDHu3
Hi, I was wondering about the release versions of both Jetty and log4j components released as part of CDHu3 release package. Can someone please let me know. Thanks.
legacy hadoop versions
Is there an Apache Hadoop policy towards maintenance/support of older Hadoop versions? It seems like 0.20.20* (now 1.0), 0.22, and 0.23 are the currently active branches. Regarding versions like 0.18 and 0.19, is there some policy like up to N years or up to M releases prior where legacy versions are still maintained? George
Re: hadoop versions
On Sun, Nov 15, 2009 at 6:56 PM, Mark Kerzner markkerz...@gmail.com wrote: Oops, I just switched to 0.19. There is no .20 on the EC2 that I see. Can I risk it - I have few MR-specific thing - or go back to 0.18? You can use 0.20 on EC2 using Cloudera's scripts. I wouldn't necessarily recommend moving back to 0.18.3 at this point for new development. For new projects, unless you have need for the absolute most stable release, I'd recommend 0.20.1, which is being used successfully in production by many organizations. -Todd On Sun, Nov 15, 2009 at 8:47 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, You should be good to go. However, I wouldn't recommend running 0.19 at all. Upgrade to 0.20 - 19 has lots of bugs and I know of few people running it in production - 0.19.0 especially. -Todd On Sun, Nov 15, 2009 at 4:30 PM, Mark Kerzner markkerz...@gmail.com wrote: A bit more specifically, I am planning to use 0.19.0 for EC2 cluster, because that's the only .19 version available on EC2, and the version on my workstation is 0.19.2, because it was easily available for download. Since my complete cluster runs on 0.19.0, and my jar is also there, I should have no problem, right? Thank you, Mark On Sun, Nov 15, 2009 at 3:54 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark
Re: hadoop versions
Hi Todd, Does Cloudera 0.20.1 support splitting bz2 files? Thanks, Usman On Sun, Nov 15, 2009 at 6:56 PM, Mark Kerzner markkerz...@gmail.com wrote: Oops, I just switched to 0.19. There is no .20 on the EC2 that I see. Can I risk it - I have few MR-specific thing - or go back to 0.18? You can use 0.20 on EC2 using Cloudera's scripts. I wouldn't necessarily recommend moving back to 0.18.3 at this point for new development. For new projects, unless you have need for the absolute most stable release, I'd recommend 0.20.1, which is being used successfully in production by many organizations. -Todd On Sun, Nov 15, 2009 at 8:47 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, You should be good to go. However, I wouldn't recommend running 0.19 at all. Upgrade to 0.20 - 19 has lots of bugs and I know of few people running it in production - 0.19.0 especially. -Todd On Sun, Nov 15, 2009 at 4:30 PM, Mark Kerzner markkerz...@gmail.com wrote: A bit more specifically, I am planning to use 0.19.0 for EC2 cluster, because that's the only .19 version available on EC2, and the version on my workstation is 0.19.2, because it was easily available for download. Since my complete cluster runs on 0.19.0, and my jar is also there, I should have no problem, right? Thank you, Mark On Sun, Nov 15, 2009 at 3:54 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Re: hadoop versions
Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark
Re: hadoop versions
A bit more specifically, I am planning to use 0.19.0 for EC2 cluster, because that's the only .19 version available on EC2, and the version on my workstation is 0.19.2, because it was easily available for download. Since my complete cluster runs on 0.19.0, and my jar is also there, I should have no problem, right? Thank you, Mark On Sun, Nov 15, 2009 at 3:54 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark
Re: hadoop versions
Hi Mark, You should be good to go. However, I wouldn't recommend running 0.19 at all. Upgrade to 0.20 - 19 has lots of bugs and I know of few people running it in production - 0.19.0 especially. -Todd On Sun, Nov 15, 2009 at 4:30 PM, Mark Kerzner markkerz...@gmail.com wrote: A bit more specifically, I am planning to use 0.19.0 for EC2 cluster, because that's the only .19 version available on EC2, and the version on my workstation is 0.19.2, because it was easily available for download. Since my complete cluster runs on 0.19.0, and my jar is also there, I should have no problem, right? Thank you, Mark On Sun, Nov 15, 2009 at 3:54 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark
Re: hadoop versions
Oops, I just switched to 0.19. There is no .20 on the EC2 that I see. Can I risk it - I have few MR-specific thing - or go back to 0.18? On Sun, Nov 15, 2009 at 8:47 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, You should be good to go. However, I wouldn't recommend running 0.19 at all. Upgrade to 0.20 - 19 has lots of bugs and I know of few people running it in production - 0.19.0 especially. -Todd On Sun, Nov 15, 2009 at 4:30 PM, Mark Kerzner markkerz...@gmail.com wrote: A bit more specifically, I am planning to use 0.19.0 for EC2 cluster, because that's the only .19 version available on EC2, and the version on my workstation is 0.19.2, because it was easily available for download. Since my complete cluster runs on 0.19.0, and my jar is also there, I should have no problem, right? Thank you, Mark On Sun, Nov 15, 2009 at 3:54 PM, Todd Lipcon t...@cloudera.com wrote: Hi Mark, The simple answer is yes, to be safest, they should match. In truth, the answer is a bit more complex. Since Java is dynamically linked (classloaded) at runtime, as long as the method signatures and class names you're using in your code haven't changed between versions, your jar compiled against one version will run against another. Between different versions of the same major release (eg 0.20 to 0.20.1) this is almost always the case except for the occasional backported new API in a later version. So, if you've compiled against 0.20.0 and then run against 0.20.1, or a Cloudera build like 0.20.1+152 you should be fine. If you're using a jar compiled against 0.18.3 and trying to run on 0.20, though, your luck will be much more varied. Most of the APIs from 0.18.3 are still present but there are a few things that will break with strange errors. So, I *strongly* recommend that you compile against the same major version you plan on running against. All of the above is referring to the case when you're compiling a jar locally, and then copying it to the cluster and using the cluster's hadoop jar foo.jar command to submit the job. If you're trying to use your local Hadoop installation pointed at the jobtracker from the remote cluster, the requirements are a bit more strict - the protocol version numbers must match, which means you *must* run the same major release (0.18 hadoop cannot submit to an 0.20 cluster). Thanks -Todd On Sun, Nov 15, 2009 at 1:37 PM, Mark Kerzner markkerz...@gmail.com wrote: Hi, when I am building my jar for a MapReduce job, I include the version of Hadoop I am running on my workstation. When I run Hadoop on a cluster, there is a version that runs on a cluster. Do they have to match? In other words, how does my Hadoop jar interacts with the cluster's Hadoop? Thank you, Mark
Re: discyp between different versions of Hadoop...
You pretty much have to stage the files through somethime. If you can make source version of hadoop's fuse mount work, you can copy in, using the fuse mount as a source. On Sun, Sep 6, 2009 at 10:50 PM, C G parallel...@yahoo.com wrote: Sorry...subject should be distcp obviously... Also trying to pull from the new grid from the old yields a java.io.EOFException... --- On Mon, 9/7/09, C G parallel...@yahoo.com wrote: From: C G parallel...@yahoo.com Subject: discyp between different versions of Hadoop... To: core-u...@hadoop.apache.org Date: Monday, September 7, 2009, 1:45 AM Hi All: Does anybody know if it's possible to distcp between an old version of Hadoop (0.15.x, for example) and a modern version (0.19.2)? A quick check trying to move from an old grid to a new grid shows a Incorrect header or version mismatch error in the new grid's NameNode log, and a SocketTimeOut exception on the distcp on the old grid. Any help/info most appreciated. Thanks,C G -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: discyp between different versions of Hadoop...
On Sun, 6 Sep 2009 22:45:28 -0700 (PDT) C G parallel...@yahoo.com wrote: Hi All: Does anybody know if it's possible to distcp between an old version of Hadoop (0.15.x, for example) and a modern version (0.19.2)? Yes: 1) Run the distcp job on the newer cluster. 2) Use the hftp method in the source URI. Example: hadoop distcp hftp://oldclusternamenode:50070/path/to/src \ hdfs://newclusternamenode:8020/path/to/dst See http://hadoop.apache.org/common/docs/r0.20.0/distcp.html#cpver Cheers, \EF -- Erik Forsberg forsb...@opera.com Developer, Opera Software - http://www.opera.com/
Re: discyp between different versions of Hadoop...
Thank you, I don't think of the ftp interface at all and had completely forgotten it. On Mon, Sep 7, 2009 at 12:00 AM, Erik Forsberg forsb...@opera.com wrote: On Sun, 6 Sep 2009 22:45:28 -0700 (PDT) C G parallel...@yahoo.com wrote: Hi All: Does anybody know if it's possible to distcp between an old version of Hadoop (0.15.x, for example) and a modern version (0.19.2)? Yes: 1) Run the distcp job on the newer cluster. 2) Use the hftp method in the source URI. Example: hadoop distcp hftp://oldclusternamenode:50070/path/to/src \ hdfs://newclusternamenode:8020/path/to/dst See http://hadoop.apache.org/common/docs/r0.20.0/distcp.html#cpver Cheers, \EF -- Erik Forsberg forsb...@opera.com Developer, Opera Software - http://www.opera.com/ -- Pro Hadoop, a book to guide you from beginner to hadoop mastery, http://www.amazon.com/dp/1430219424?tag=jewlerymall www.prohadoopbook.com a community for Hadoop Professionals
Re: discyp between different versions of Hadoop...
Sorry...subject should be distcp obviously... Also trying to pull from the new grid from the old yields a java.io.EOFException... --- On Mon, 9/7/09, C G parallel...@yahoo.com wrote: From: C G parallel...@yahoo.com Subject: discyp between different versions of Hadoop... To: core-u...@hadoop.apache.org Date: Monday, September 7, 2009, 1:45 AM Hi All: Does anybody know if it's possible to distcp between an old version of Hadoop (0.15.x, for example) and a modern version (0.19.2)? A quick check trying to move from an old grid to a new grid shows a Incorrect header or version mismatch error in the new grid's NameNode log, and a SocketTimeOut exception on the distcp on the old grid. Any help/info most appreciated. Thanks,C G
which versions of pig,nutch and hadoop are requeired to run at once
Hi I am using pig 2.0 and nutch 1.0; but it dont have common hadoop verion. what is common hadoop verion for both pig and hadoop; GIVE the pig version, nutch version and hadoo please can any one help on this thanks ramanaiah