Related to speed of execution of Job in Amazon Elastic Mapreduce

2012-05-04 Thread Bhavesh Shah
My Task is

1) Initially I want to import the data from MS SQL Server into HDFS using
SQOOP.

2) Through Hive I am processing the data and generating the result in one
table

3) That result containing table from Hive is again exported to MS SQL
SERVER back.

I want to perform all this using Amazon Elastic Map Reduce.


The data which I am importing from MS SQL Server is very large (near about
5,00,000 entries in one table. Like wise I have 30 tables). For this I have
written a task in Hive which contains only queries (And each query has used
a lot of joins in it). So due to this the performance is very poor on my
single local machine ( It takes near about 3 hrs to execute completely).

I want to reduce that time as much less as possible. For that we have
decided to use Amazon Elastic Mapreduce. Currently I am using 3 m1.large
instance and still I have same performance as on my local machine.

In order to improve the performance what number of instances should I need
to use? As number of instances we use are they configured automatically or
do I need to specify while submitting JAR to it for execution? Because as I
use two machine time is same.

And also Is there any other way to improve the performance or just to
increase the number of instance. Or am I doing something wrong while
executing JAR?

Please guide me through this as I don't much about the Amazon Servers.

Thanks.


-- 
Regards,
Bhavesh Shah


Build failed in Jenkins: Hadoop-Common-trunk #396

2012-05-04 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-trunk/396/changes

Changes:

[tucu] HADOOP-8343. Allow configuration of authorization for JmxJsonServlet and 
MetricsServlet (tucu)

[tucu] MAPREDUCE-4205. retrofit all JVM shutdown hooks to use 
ShutdownHookManager (tucu)

[tucu] HADOOP-8355. SPNEGO filter throws/logs exception when authentication 
fails (tucu)

[tucu] HADOOP-8356. FileSystem service loading mechanism should print the 
FileSystem impl it is failing to load (tucu)

[szetszwo] HDFS-3350. In INode, add final to compareTo(..), equals(..) and 
hashCode(), and remove synchronized from updatePermissionStatus(..).

[todd] Amend previous commit of HADOOP-8350 (missed new SocketInputWrapper file)

[todd] HADOOP-8350. Improve NetUtils.getInputStream to return a stream which 
has a tunable timeout. Contributed by Todd Lipcon.

[todd] HDFS-3359. DFSClient.close should close cached sockets. Contributed by 
Todd Lipcon.

[umamahesh] HDFS-3332. NullPointerException in DN when directoryscanner is 
trying to report bad blocks. Contributed by  Amith D K.

[bobby] MAPREDUCE-4163. consistently set the bind address (Daryn Sharp via 
bobby)

[ddas] HADOOP-8346. Makes oid changes to make SPNEGO work. Was broken due to 
fixes introduced by the IBM JDK compatibility patch. Contributed by Devaraj Das.

--
[...truncated 45828 lines...]
[DEBUG]   (f) reactorProjects = [MavenProject: 
org.apache.hadoop:hadoop-annotations:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-annotations/pom.xml,
 MavenProject: org.apache.hadoop:hadoop-auth:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth/pom.xml,
 MavenProject: org.apache.hadoop:hadoop-auth-examples:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-auth-examples/pom.xml,
 MavenProject: org.apache.hadoop:hadoop-common:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/hadoop-common/pom.xml,
 MavenProject: org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml]
[DEBUG]   (f) useDefaultExcludes = true
[DEBUG]   (f) useDefaultManifestFile = false
[DEBUG] -- end configuration --
[INFO] 
[INFO] --- maven-enforcer-plugin:1.0:enforce (dist-enforce) @ 
hadoop-common-project ---
[DEBUG] Configuring mojo 
org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce from plugin realm 
ClassRealm[pluginorg.apache.maven.plugins:maven-enforcer-plugin:1.0, parent: 
sun.misc.Launcher$AppClassLoader@126b249]
[DEBUG] Configuring mojo 
'org.apache.maven.plugins:maven-enforcer-plugin:1.0:enforce' with basic 
configurator --
[DEBUG]   (s) fail = true
[DEBUG]   (s) failFast = false
[DEBUG]   (f) ignoreCache = false
[DEBUG]   (s) project = MavenProject: 
org.apache.hadoop:hadoop-common-project:3.0.0-SNAPSHOT @ 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project/pom.xml
[DEBUG]   (s) version = [3.0.2,)
[DEBUG]   (s) version = 1.6
[DEBUG]   (s) rules = 
[org.apache.maven.plugins.enforcer.RequireMavenVersion@65ae5c, 
org.apache.maven.plugins.enforcer.RequireJavaVersion@19a52a3]
[DEBUG]   (s) session = org.apache.maven.execution.MavenSession@107b56e
[DEBUG]   (s) skip = false
[DEBUG] -- end configuration --
[DEBUG] Executing rule: org.apache.maven.plugins.enforcer.RequireMavenVersion
[DEBUG] Rule org.apache.maven.plugins.enforcer.RequireMavenVersion is cacheable.
[DEBUG] Key org.apache.maven.plugins.enforcer.RequireMavenVersion -937312197 
was found in the cache
[DEBUG] The cached results are still valid. Skipping the rule: 
org.apache.maven.plugins.enforcer.RequireMavenVersion
[DEBUG] Executing rule: org.apache.maven.plugins.enforcer.RequireJavaVersion
[DEBUG] Rule org.apache.maven.plugins.enforcer.RequireJavaVersion is cacheable.
[DEBUG] Key org.apache.maven.plugins.enforcer.RequireJavaVersion 48569 was 
found in the cache
[DEBUG] The cached results are still valid. Skipping the rule: 
org.apache.maven.plugins.enforcer.RequireJavaVersion
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-common-project ---
[DEBUG] Configuring mojo 
org.apache.maven.plugins:maven-site-plugin:3.0:attach-descriptor from plugin 
realm ClassRealm[pluginorg.apache.maven.plugins:maven-site-plugin:3.0, parent: 
sun.misc.Launcher$AppClassLoader@126b249]
[DEBUG] Configuring mojo 
'org.apache.maven.plugins:maven-site-plugin:3.0:attach-descriptor' with basic 
configurator --
[DEBUG]   (f) basedir = 
https://builds.apache.org/job/Hadoop-Common-trunk/ws/trunk/hadoop-common-project
[DEBUG]   (f) inputEncoding = UTF-8
[DEBUG]   (f) localRepository =id: local
  url: file:///home/jenkins/.m2/repository/
   layout: none

[DEBUG]   (f) outputEncoding = UTF-8
[DEBUG]   (f) pomPackagingOnly = true
[DEBUG]   (f) 

Build failed in Jenkins: Hadoop-Common-0.23-Build #242

2012-05-04 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Common-0.23-Build/242/changes

Changes:

[bobby] MAPREDUCE-4163. consistently set the bind address (Daryn Sharp via 
bobby)

--
[...truncated 12306 lines...]
  [javadoc] Loading source files for package org.apache.hadoop.fs.local...
  [javadoc] Loading source files for package org.apache.hadoop.fs.permission...
  [javadoc] Loading source files for package org.apache.hadoop.fs.s3...
  [javadoc] Loading source files for package org.apache.hadoop.fs.s3native...
  [javadoc] Loading source files for package org.apache.hadoop.fs.shell...
  [javadoc] Loading source files for package org.apache.hadoop.fs.viewfs...
  [javadoc] Loading source files for package org.apache.hadoop.http...
  [javadoc] Loading source files for package org.apache.hadoop.http.lib...
  [javadoc] Loading source files for package org.apache.hadoop.io...
  [javadoc] Loading source files for package org.apache.hadoop.io.compress...
  [javadoc] Loading source files for package 
org.apache.hadoop.io.compress.bzip2...
  [javadoc] Loading source files for package 
org.apache.hadoop.io.compress.lz4...
  [javadoc] Loading source files for package 
org.apache.hadoop.io.compress.snappy...
  [javadoc] Loading source files for package 
org.apache.hadoop.io.compress.zlib...
  [javadoc] Loading source files for package org.apache.hadoop.io.file.tfile...
  [javadoc] Loading source files for package org.apache.hadoop.io.nativeio...
  [javadoc] Loading source files for package org.apache.hadoop.io.retry...
  [javadoc] Loading source files for package org.apache.hadoop.io.serializer...
  [javadoc] Loading source files for package 
org.apache.hadoop.io.serializer.avro...
  [javadoc] Loading source files for package org.apache.hadoop.ipc...
  [javadoc] Loading source files for package org.apache.hadoop.ipc.metrics...
  [javadoc] Loading source files for package org.apache.hadoop.jmx...
  [javadoc] Loading source files for package org.apache.hadoop.log...
  [javadoc] Loading source files for package org.apache.hadoop.log.metrics...
  [javadoc] Loading source files for package org.apache.hadoop.metrics...
  [javadoc] Loading source files for package org.apache.hadoop.metrics.file...
  [javadoc] Loading source files for package 
org.apache.hadoop.metrics.ganglia...
  [javadoc] Loading source files for package org.apache.hadoop.metrics.jvm...
  [javadoc] Loading source files for package org.apache.hadoop.metrics.spi...
  [javadoc] Loading source files for package org.apache.hadoop.metrics.util...
  [javadoc] Loading source files for package org.apache.hadoop.metrics2...
  [javadoc] Loading source files for package 
org.apache.hadoop.metrics2.annotation...
  [javadoc] Loading source files for package 
org.apache.hadoop.metrics2.filter...
  [javadoc] Loading source files for package org.apache.hadoop.metrics2.impl...
  [javadoc] Loading source files for package org.apache.hadoop.metrics2.lib...
  [javadoc] Loading source files for package org.apache.hadoop.metrics2.sink...
  [javadoc] Loading source files for package 
org.apache.hadoop.metrics2.sink.ganglia...
  [javadoc] Loading source files for package 
org.apache.hadoop.metrics2.source...
  [javadoc] Loading source files for package org.apache.hadoop.metrics2.util...
  [javadoc] Loading source files for package org.apache.hadoop.net...
  [javadoc] Loading source files for package org.apache.hadoop.record...
  [javadoc] Loading source files for package 
org.apache.hadoop.record.compiler...
  [javadoc] Loading source files for package 
org.apache.hadoop.record.compiler.ant...
  [javadoc] Loading source files for package 
org.apache.hadoop.record.compiler.generated...
  [javadoc] Loading source files for package org.apache.hadoop.record.meta...
  [javadoc] Loading source files for package org.apache.hadoop.security...
  [javadoc] Loading source files for package 
org.apache.hadoop.security.authorize...
  [javadoc] Loading source files for package org.apache.hadoop.security.token...
  [javadoc] Loading source files for package 
org.apache.hadoop.security.token.delegation...
  [javadoc] Loading source files for package org.apache.hadoop.tools...
  [javadoc] Loading source files for package org.apache.hadoop.util...
  [javadoc] Loading source files for package org.apache.hadoop.util.bloom...
  [javadoc] Loading source files for package org.apache.hadoop.util.hash...
  [javadoc] 2 errors
 [xslt] Processing 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-common-project/hadoop-common/target/findbugsXml.xml
 to 
https://builds.apache.org/job/Hadoop-Common-0.23-Build/ws/trunk/hadoop-common-project/hadoop-common/target/site/findbugs.html
 [xslt] Loading stylesheet 
/home/jenkins/tools/findbugs/latest/src/xsl/default.xsl
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (pre-dist) @ hadoop-common ---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO]  

[jira] [Created] (HADOOP-8359) Clear up javadoc warnings in hadoop-common-project

2012-05-04 Thread Harsh J (JIRA)
Harsh J created HADOOP-8359:
---

 Summary: Clear up javadoc warnings in hadoop-common-project
 Key: HADOOP-8359
 URL: https://issues.apache.org/jira/browse/HADOOP-8359
 Project: Hadoop Common
  Issue Type: Task
  Components: conf
Affects Versions: 2.0.0
Reporter: Harsh J
Priority: Trivial


Javadocs added in HADOOP-8172 has introduced two new javadoc warnings. Should 
be easy to fix these (just missing #s for method refs).

{code}
[WARNING] Javadoc Warnings
[WARNING] 
/Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:334:
 warning - Tag @link: missing '#': addDeprecation(String key, String newKey)
[WARNING] 
/Users/harshchouraria/Work/code/apache/hadoop/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/conf/Configuration.java:285:
 warning - Tag @link: missing '#': addDeprecation(String key, String newKey,
[WARNING] String customMessage)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8360) empty-configuration.xml fails xml validation

2012-05-04 Thread Radim Kolar (JIRA)
Radim Kolar created HADOOP-8360:
---

 Summary: empty-configuration.xml fails xml validation
 Key: HADOOP-8360
 URL: https://issues.apache.org/jira/browse/HADOOP-8360
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Radim Kolar
Priority: Minor
 Attachments: invalid-xml.txt

/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml

?xml declaration cant follow comment

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8361) avoid out-of-memory problems when deserializing strings

2012-05-04 Thread Colin Patrick McCabe (JIRA)
Colin Patrick McCabe created HADOOP-8361:


 Summary: avoid out-of-memory problems when deserializing strings
 Key: HADOOP-8361
 URL: https://issues.apache.org/jira/browse/HADOOP-8361
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Colin Patrick McCabe
Assignee: Colin Patrick McCabe
Priority: Minor


In HDFS, we want to be able to read the edit log without crashing on an OOM 
condition.  Unfortunately, we currently cannot do this, because there are no 
limits on the length of certain data types we pull from the edit log.  We often 
read strings without setting any upper limit on the length we're prepared to 
accept.

It's not that we don't have limits on strings-- for example, HDFS limits the 
maximum path length to 8000 UCS-2 characters.  Linux limits the maximum user 
name length to either 64 or 128 bytes, depending on what version you are 
running.  It's just that we're not exposing these limits to the deserialization 
functions that need to be aware of them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8362) Improve exception message when Configuration.set() is called with a null key or value

2012-05-04 Thread Todd Lipcon (JIRA)
Todd Lipcon created HADOOP-8362:
---

 Summary: Improve exception message when Configuration.set() is 
called with a null key or value
 Key: HADOOP-8362
 URL: https://issues.apache.org/jira/browse/HADOOP-8362
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Affects Versions: 2.0.0
Reporter: Todd Lipcon
Priority: Trivial


Currently, calling Configuration.set(...) with a null value results in a 
NullPointerException within Properties.setProperty. We should check for null 
key/value and throw a better exception.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HADOOP-8363) publish Hadoop-* sources and javadoc to maven repositories.

2012-05-04 Thread Jonathan Hsieh (JIRA)
Jonathan Hsieh created HADOOP-8363:
--

 Summary: publish Hadoop-* sources and javadoc to maven 
repositories.
 Key: HADOOP-8363
 URL: https://issues.apache.org/jira/browse/HADOOP-8363
 Project: Hadoop Common
  Issue Type: New Feature
Reporter: Jonathan Hsieh


I believe the hadoop 1.0.x series does not have the source jars published on 
maven repos.  

{code}
hbase-trunk$ mvn eclipse:eclipse -DdownloadSources -DdownloadJavadocs
...
[INFO] Wrote Eclipse project for hbase to /home/jon/proj/hbase-trunk.
[INFO] 
   Sources for some artifacts are not available.
   List of artifacts without a source archive:
 o org.apache.hadoop:hadoop-core:1.0.2
 o org.apache.hadoop:hadoop-test:1.0.2
{code}

It would be great if the poms were setup so that this would pull in the source 
jars as well!  I believe this is in place for the 0.23/2.x release lines.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira