Related to speed of execution of Job in Amazon Elastic Mapreduce
My Task is 1) Initially I want to import the data from MS SQL Server into HDFS using SQOOP. 2) Through Hive I am processing the data and generating the result in one table 3) That result containing table from Hive is again exported to MS SQL SERVER back. I want to perform all this using Amazon Elastic Map Reduce. The data which I am importing from MS SQL Server is very large (near about 5,00,000 entries in one table. Like wise I have 30 tables). For this I have written a task in Hive which contains only queries (And each query has used a lot of joins in it). So due to this the performance is very poor on my single local machine ( It takes near about 3 hrs to execute completely). I want to reduce that time as much less as possible. For that we have decided to use Amazon Elastic Mapreduce. Currently I am using 3 m1.large instance and still I have same performance as on my local machine. In order to improve the performance what number of instances should I need to use? As number of instances we use are they configured automatically or do I need to specify while submitting JAR to it for execution? Because as I use two machine time is same. And also Is there any other way to improve the performance or just to increase the number of instance. Or am I doing something wrong while executing JAR? Please guide me through this as I don't much about the Amazon Servers. Thanks. -- Regards, Bhavesh Shah
Build failed in Jenkins: Hadoop-Common-trunk #396
See https://builds.apache.org/job/Hadoop-Common-trunk/396/changes Changes: [tucu] HADOOP-8343. Allow configuration of authorization for JmxJsonServlet and MetricsServlet (tucu) [tucu] MAPREDUCE-4205. retrofit all JVM shutdown hooks to use ShutdownHookManager (tucu) [tucu] HADOOP-8355. SPNEGO filter throws/logs exception when authentication fails (tucu) [tucu] HADOOP-8356. FileSystem service loading mechanism should print the FileSystem impl it is failing to load (tucu) [szetszwo] HDFS-3350. In INode, add final to compareTo(..), equals(..) and hashCode(), and remove synchronized from updatePermissionStatus(..). [todd] Amend previous commit of HADOOP-8350 (missed new SocketInputWrapper file) [todd] HADOOP-8350. Improve NetUtils.getInputStream to return a stream which has a tunable timeout. Contributed by Todd Lipcon. [todd] HDFS-3359. DFSClient.close should close cached sockets. Contributed by Todd Lipcon. [umamahesh] HDFS-3332. NullPointerException in DN when directoryscanner is trying to report bad blocks. Contributed by Amith D K. [bobby] MAPREDUCE-4163. consistently set the bind address (Daryn Sharp via bobby) [ddas] HADOOP-8346. Makes oid changes to make SPNEGO work. Was broken due to fixes introduced by the IBM JDK compatibility patch. Contributed by Devaraj Das. -- [...truncated 45828 lines...] [DEBUG] (f) reactorProjects = [MavenProject: org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-annotations/pom.xml, MavenProject: org.apache.hadoop:hadoop-auth:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth/pom.xml, MavenProject: org.apache.hadoop:hadoop-auth-examples:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/pom.xml, MavenProject: org.apache.hadoop:hadoop-common:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-common/pom.xml, MavenProject: org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml] [DEBUG] (f) useDefaultExcludes = true [DEBUG] (f) useDefaultManifestFile = false [DEBUG] -- end configuration -- [INFO] [INFO] --- maven-enforcer-plugin:1.0:enforce (dist-enforce) @ hadoop-common-project --- [DEBUG] Configuring mojo org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce from plugin realm ClassRealm[pluginorg.apache.maven.plugins:maven-enforcer-plugin:1.0, parent: sun.misc.Launcher$AppClassLoader@126b249] [DEBUG] Configuring mojo 'org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce' with basic configurator -- [DEBUG] (s) fail = true [DEBUG] (s) failFast = false [DEBUG] (f) ignoreCache = false [DEBUG] (s) project = MavenProject: org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml [DEBUG] (s) version = [3.0.2,) [DEBUG] (s) version = 1.6 [DEBUG] (s) rules = [org.apache.maven.plugins.enforcer.RequireMavenVersion@65ae5c, org.apache.maven.plugins.enforcer.RequireJavaVersion@19a52a3] [DEBUG] (s) session = org.apache.maven.execution.MavenSession@107b56e [DEBUG] (s) skip = false [DEBUG] -- end configuration -- [DEBUG] Executing rule: org.apache.maven.plugins.enforcer.RequireMavenVersion [DEBUG] Rule org.apache.maven.plugins.enforcer.RequireMavenVersion is cacheable. [DEBUG] Key org.apache.maven.plugins.enforcer.RequireMavenVersion -937312197 was found in the cache [DEBUG] The cached results are still valid. Skipping the rule: org.apache.maven.plugins.enforcer.RequireMavenVersion [DEBUG] Executing rule: org.apache.maven.plugins.enforcer.RequireJavaVersion [DEBUG] Rule org.apache.maven.plugins.enforcer.RequireJavaVersion is cacheable. [DEBUG] Key org.apache.maven.plugins.enforcer.RequireJavaVersion 48569 was found in the cache [DEBUG] The cached results are still valid. Skipping the rule: org.apache.maven.plugins.enforcer.RequireJavaVersion [INFO] [INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ hadoop-common-project --- [DEBUG] Configuring mojo org.apache.maven.plugins:maven-site-plugin:3.0:attach-descriptor from plugin realm ClassRealm[pluginorg.apache.maven.plugins:maven-site-plugin:3.0, parent: sun.misc.Launcher$AppClassLoader@126b249] [DEBUG] Configuring mojo 'org.apache.maven.plugins:maven-site-plugin:3.0:attach-descriptor' with basic configurator -- [DEBUG] (f) basedir = https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project [DEBUG] (f) inputEncoding = UTF-8 [DEBUG] (f) localRepository =id: local url: file:///home/jenkins/.m2/repository/ layout: none [DEBUG] (f) outputEncoding = UTF-8 [DEBUG] (f) pomPackagingOnly = true [DEBUG] (f)
Build failed in Jenkins: Hadoop-Common-0.23-Build #242
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/242/changes Changes: [bobby] MAPREDUCE-4163. consistently set the bind address (Daryn Sharp via bobby) -- [...truncated 12306 lines...] [javadoc] Loading source files for package org.apache.hadoop.fs.local... [javadoc] Loading source files for package org.apache.hadoop.fs.permission... [javadoc] Loading source files for package org.apache.hadoop.fs.s3... [javadoc] Loading source files for package org.apache.hadoop.fs.s3native... [javadoc] Loading source files for package org.apache.hadoop.fs.shell... [javadoc] Loading source files for package org.apache.hadoop.fs.viewfs... [javadoc] Loading source files for package org.apache.hadoop.http... [javadoc] Loading source files for package org.apache.hadoop.http.lib... [javadoc] Loading source files for package org.apache.hadoop.io... [javadoc] Loading source files for package org.apache.hadoop.io.compress... [javadoc] Loading source files for package org.apache.hadoop.io.compress.bzip2... [javadoc] Loading source files for package org.apache.hadoop.io.compress.lz4... [javadoc] Loading source files for package org.apache.hadoop.io.compress.snappy... [javadoc] Loading source files for package org.apache.hadoop.io.compress.zlib... [javadoc] Loading source files for package org.apache.hadoop.io.file.tfile... [javadoc] Loading source files for package org.apache.hadoop.io.nativeio... [javadoc] Loading source files for package org.apache.hadoop.io.retry... [javadoc] Loading source files for package org.apache.hadoop.io.serializer... [javadoc] Loading source files for package org.apache.hadoop.io.serializer.avro... [javadoc] Loading source files for package org.apache.hadoop.ipc... [javadoc] Loading source files for package org.apache.hadoop.ipc.metrics... [javadoc] Loading source files for package org.apache.hadoop.jmx... [javadoc] Loading source files for package org.apache.hadoop.log... [javadoc] Loading source files for package org.apache.hadoop.log.metrics... [javadoc] Loading source files for package org.apache.hadoop.metrics... [javadoc] Loading source files for package org.apache.hadoop.metrics.file... [javadoc] Loading source files for package org.apache.hadoop.metrics.ganglia... [javadoc] Loading source files for package org.apache.hadoop.metrics.jvm... [javadoc] Loading source files for package org.apache.hadoop.metrics.spi... [javadoc] Loading source files for package org.apache.hadoop.metrics.util... [javadoc] Loading source files for package org.apache.hadoop.metrics2... [javadoc] Loading source files for package org.apache.hadoop.metrics2.annotation... [javadoc] Loading source files for package org.apache.hadoop.metrics2.filter... [javadoc] Loading source files for package org.apache.hadoop.metrics2.impl... [javadoc] Loading source files for package org.apache.hadoop.metrics2.lib... [javadoc] Loading source files for package org.apache.hadoop.metrics2.sink... [javadoc] Loading source files for package org.apache.hadoop.metrics2.sink.ganglia... [javadoc] Loading source files for package org.apache.hadoop.metrics2.source... [javadoc] Loading source files for package org.apache.hadoop.metrics2.util... [javadoc] Loading source files for package org.apache.hadoop.net... [javadoc] Loading source files for package org.apache.hadoop.record... [javadoc] Loading source files for package org.apache.hadoop.record.compiler... [javadoc] Loading source files for package org.apache.hadoop.record.compiler.ant... [javadoc] Loading source files for package org.apache.hadoop.record.compiler.generated... [javadoc] Loading source files for package org.apache.hadoop.record.meta... [javadoc] Loading source files for package org.apache.hadoop.security... [javadoc] Loading source files for package org.apache.hadoop.security.authorize... [javadoc] Loading source files for package org.apache.hadoop.security.token... [javadoc] Loading source files for package org.apache.hadoop.security.token.delegation... [javadoc] Loading source files for package org.apache.hadoop.tools... [javadoc] Loading source files for package org.apache.hadoop.util... [javadoc] Loading source files for package org.apache.hadoop.util.bloom... [javadoc] Loading source files for package org.apache.hadoop.util.hash... [javadoc] 2 errors [xslt] Processing https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-common-project/hadoop-common/target/findbugsXml.xml to https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-common-project/hadoop-common/target/site/findbugs.html [xslt] Loading stylesheet /home/jenkins/tools/findbugs/latest/src/xsl/default.xsl [INFO] Executed tasks [INFO] [INFO] --- maven-antrun-plugin:1.6:run (pre-dist) @ hadoop-common --- [INFO] Executing tasks main: [INFO] Executed tasks [INFO] [INFO]
[jira] [Created] (HADOOP-8359) Clear up javadoc warnings in hadoop-common-project
Harsh J created HADOOP-8359: --- Summary: Clear up javadoc warnings in hadoop-common-project Key: HADOOP-8359 URL: https://issues.apache.org/jira/browse/HADOOP-8359 Project: Hadoop Common Issue Type: Task Components: conf Affects Versions: 2.0.0 Reporter: Harsh J Priority: Trivial Javadocs added in HADOOP-8172 has introduced two new javadoc warnings. Should be easy to fix these (just missing #s for method refs). {code} [WARNING] Javadoc Warnings [WARNING] /Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:334: warning - Tag @link: missing '#': addDeprecation(String key, String newKey) [WARNING] /Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:285: warning - Tag @link: missing '#': addDeprecation(String key, String newKey, [WARNING] String customMessage) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8360) empty-configuration.xml fails xml validation
Radim Kolar created HADOOP-8360: --- Summary: empty-configuration.xml fails xml validation Key: HADOOP-8360 URL: https://issues.apache.org/jira/browse/HADOOP-8360 Project: Hadoop Common Issue Type: Bug Reporter: Radim Kolar Priority: Minor Attachments: invalid-xml.txt /hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml ?xml declaration cant follow comment -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8361) avoid out-of-memory problems when deserializing strings
Colin Patrick McCabe created HADOOP-8361: Summary: avoid out-of-memory problems when deserializing strings Key: HADOOP-8361 URL: https://issues.apache.org/jira/browse/HADOOP-8361 Project: Hadoop Common Issue Type: Bug Reporter: Colin Patrick McCabe Assignee: Colin Patrick McCabe Priority: Minor In HDFS, we want to be able to read the edit log without crashing on an OOM condition. Unfortunately, we currently cannot do this, because there are no limits on the length of certain data types we pull from the edit log. We often read strings without setting any upper limit on the length we're prepared to accept. It's not that we don't have limits on strings-- for example, HDFS limits the maximum path length to 8000 UCS-2 characters. Linux limits the maximum user name length to either 64 or 128 bytes, depending on what version you are running. It's just that we're not exposing these limits to the deserialization functions that need to be aware of them. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value
Todd Lipcon created HADOOP-8362: --- Summary: Improve exception message when Configuration.set() is called with a null key or value Key: HADOOP-8362 URL: https://issues.apache.org/jira/browse/HADOOP-8362 Project: Hadoop Common Issue Type: Improvement Components: conf Affects Versions: 2.0.0 Reporter: Todd Lipcon Priority: Trivial Currently, calling Configuration.set(...) with a null value results in a NullPointerException within Properties.setProperty. We should check for null key/value and throw a better exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HADOOP-8363) publish Hadoop-* sources and javadoc to maven repositories.
Jonathan Hsieh created HADOOP-8363: -- Summary: publish Hadoop-* sources and javadoc to maven repositories. Key: HADOOP-8363 URL: https://issues.apache.org/jira/browse/HADOOP-8363 Project: Hadoop Common Issue Type: New Feature Reporter: Jonathan Hsieh I believe the hadoop 1.0.x series does not have the source jars published on maven repos. {code} hbase-trunk$ mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs ... [INFO] Wrote Eclipse project for hbase to /home/jon/proj/hbase-trunk. [INFO] Sources for some artifacts are not available. List of artifacts without a source archive: o org.apache.hadoop:hadoop-core:1.0.2 o org.apache.hadoop:hadoop-test:1.0.2 {code} It would be great if the poms were setup so that this would pull in the source jars as well! I believe this is in place for the 0.23/2.x release lines. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira