+1 (non-binding)* Built from source* Deployed on a pseudo-distributed cluster
(MAC)* Ran wordcount and sleep jobs.
On Wednesday, January 25, 2017 3:21 AM, Marton Elek <[email protected]>
wrote:
Hi,
I also did a quick smoketest with the provided 3.0.0-alpha2 binaries:
TLDR; It works well
Environment:
* 5 hosts, docker based hadoop cluster, every component in separated container
(5 datanode/5 nodemanager/...)
* Components are:
* Hdfs/Yarn cluster (upgraded 2.7.3 to 3.0.0-alpha2 using the binary package
for vote)
* Zeppelin 0.6.2/0.7.0-RC2
* Spark 2.0.2/2.1.0
* HBase 1.2.4 + zookeeper
* + additional docker containers for configuration management and monitoring
* No HA, no kerberos, no wire encryption
* HDFS cluster upgraded successfully from 2.7.3 (with about 200G data)
* Imported 100G data to HBase successfully
* Started Spark jobs to process 1G json from HDFS (using spark-master/slave
cluster). It worked even when I used the Zeppelin 0.6.2 + Spark 2.0.2 (with old
hadoop client included). Obviously the old version can't use the new Yarn
cluster as the token file format has been changed.
* I upgraded my setup to use Zeppelin 0.7.0-RC2/Spark 2.1.0(distribution
without hadoop)/hadoop 3.0.0-alpha2. It also worked well: processed the same
json files from HDFS with spark jobs (from zeppelin) using yarn cluster
(master: yarn deploy-mode: cluster)
* Started spark jobs (with spark submit, master: yarn) to count records from
the hbase database: OK
* Started example Mapreduce jobs from distribution over yarn. It was OK but
only with specific configuration (see bellow)
So my overall impression that it works very well (at least with my 'smalldata')
Some notes (none of them are blocking):
1. To run the example mapreduce jobs I defined HADOOP_MAPRED_HOME at command
line:
./bin/yarn jar
share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha2.jar pi
-Dyarn.app.mapreduce.am.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}"
-Dmapreduce.admin.user.env="HADOOP_MAPRED_HOME={{HADOOP_COMMON_HOME}}" 10 10
And in the yarn-site:
yarn.nodemanager.env-whitelist:
JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,MAPRED_HOME_DIR
I don't know the exact reason for the change, but the 2.7.3 was more
userfriendly as the example could be run without specific configuration.
For the same reason I didn't start hbase mapreduce job with hbase command line
app (There could be some option for hbase to define MAPRED_HOME_DIR as well,
but by default I got ClassNotFoundException for one of the MR class)
2. For the records: The logging and htrace classes are excluded from the shaded
hadoop client jar so I added it manually one by one to the spark (spark 2.1.0
distribution without hadoop):
RUN wget `cat url` -O spark.tar.gz && tar zxf spark.tar.gz && rm spark.tar.gz
&& mv spark* spark
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-api-3.0.0-alpha2.jar
/opt/spark/jars
RUN cp /opt/hadoop/share/hadoop/client/hadoop-client-runtime-3.0.0-alpha2.jar
/opt/spark/jars
ADD
https://repo1.maven.org/maven2/org/slf4j/slf4j-log4j12/1.7.10/slf4j-log4j12-1.7.10.jar
/opt/spark/jars
ADD
https://repo1.maven.org/maven2/org/apache/htrace/htrace-core4/4.1.0-incubating/htrace-core4-4.1.0-incubating.jar
/opt/spark/jars
ADD
https://repo1.maven.org/maven2/org/slf4j/slf4j-api/1.7.10/slf4j-api-1.7.10.jar
/opt/spark/jars/
ADD https://repo1.maven.org/maven2/log4j/log4j/1.2.17/log4j-1.2.17.jar
/opt/spark/jars
With this jars files spark 2.1.0 works well with the alpha2 version of HDFS and
YARN.
3. The messages "Upgrade in progress. Not yet finalized." wasn't disappeared
from the namenode webui but the cluster works well.
Most probably I missed to do something, but it's a little bit confusing.
(I checked the REST call, it is the jmx bean who reports that it was not yet
finalized, the code of the webpage seems to be ok.)
Regards
Marton
On Jan 25, 2017, at 8:38 AM, Yongjun Zhang
<[email protected]<mailto:[email protected]>> wrote:
Thanks Andrew much for the work here!
+1 (binding).
- Downloaded both binary and src tarballs
- Verified md5 checksum and signature for both
- Built from source tarball
- Deployed 2 pseudo clusters, one with the released tarball and the other
with what I built from source, and did the following on both:
- Run basic HDFS operations, snapshots and distcp jobs
- Run pi job
- Examined HDFS webui, YARN webui.
Best,
--Yongjun
On Tue, Jan 24, 2017 at 3:56 PM, Eric Badger
<[email protected]<mailto:[email protected]>>
wrote:
+1 (non-binding)
- Verified signatures and md5- Built from source- Started single-node
cluster on my mac- Ran some sleep jobs
Eric
On Tuesday, January 24, 2017 4:32 PM, Yufei Gu
<[email protected]<mailto:[email protected]>>
wrote:
Hi Andrew,
Thanks for working on this.
+1 (Non-Binding)
1. Downloaded the binary and verified the md5.
2. Deployed it on 3 node cluster with 1 ResourceManager and 2 NodeManager.
3. Set YARN to use Fair Scheduler.
4. Ran MapReduce jobs Pi
5. Verified Hadoop version command output is correct.
Best,
Yufei
On Tue, Jan 24, 2017 at 3:02 AM, Marton Elek
<[email protected]<mailto:[email protected]>>
wrote:
]>
minicluster is kind of weird on filesystems that don't support mixed
case, like OS X's default HFS+.
$ jar tf hadoop-client-minicluster-3.0.0-alpha3-SNAPSHOT.jar | grep
-i
license
LICENSE.txt
license/
license/LICENSE
license/LICENSE.dom-documentation.txt
license/LICENSE.dom-software.txt
license/LICENSE.sax.txt
license/NOTICE
license/README.dom.txt
license/README.sax.txt
LICENSE
Grizzly_THIRDPARTYLICENSEREADME.txt
I added a patch to https://issues.apache.org/jira/browse/HADOOP-14018 to
add the missing META-INF/LICENSE.txt to the shaded files.
Question: what should be done with the other LICENSE files in the
minicluster. Can we just exclude them (from legal point of view)?
Regards,
Marton
---------------------------------------------------------------------
To unsubscribe, e-mail:
[email protected]<mailto:[email protected]>
For additional commands, e-mail:
[email protected]<mailto:[email protected]>