Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-05-28 Thread MHPC 2013
we apologize if you receive multiple copies of this message

===

CALL FOR PAPERS

2013 Workshop on

Middleware for HPC and Big Data Systems

MHPC '13

as part of Euro-Par 2013, Aachen, Germany

===

Date: August 27, 2012

Workshop URL: http://m-hpc.org

Springer LNCS

SUBMISSION DEADLINE:

June 10, 2013 - LNCS Full paper submission (extended)
June 28, 2013 - Lightning Talk abstracts


SCOPE

Extremely large, diverse, and complex data sets are generated from
scientific applications, the Internet, social media and other applications.
Data may be physically distributed and shared by an ever larger community.
Collecting, aggregating, storing and analyzing large data volumes
presents major challenges. Processing such amounts of data efficiently
has been an issue to scientific discovery and technological
advancement. In addition, making the data accessible, understandable and
interoperable includes unsolved problems. Novel middleware architectures,
algorithms, and application development frameworks are required.

In this workshop we are particularly interested in original work at the
intersection of HPC and Big Data with regard to middleware handling
and optimizations. Scope is existing and proposed middleware for HPC
and big data, including analytics libraries and frameworks.

The goal of this workshop is to bring together software architects,
middleware and framework developers, data-intensive application developers
as well as users from the scientific and engineering community to exchange
their experience in processing large datasets and to report their scientific
achievement and innovative ideas. The workshop also offers a dedicated forum
for these researchers to access the state of the art, to discuss problems
and requirements, to identify gaps in current and planned designs, and to
collaborate in strategies for scalable data-intensive computing.

The workshop will be one day in length, composed of 20 min paper
presentations, each followed by 10 min discussion sections.
Presentations may be accompanied by interactive demonstrations.


TOPICS

Topics of interest include, but are not limited to:

- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,
HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack
- Data intensive middleware architecture
- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab
- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase
- Schedulers including Cascading
- Middleware for optimized data locality/in-place data processing
- Data handling middleware for deployment in virtualized HPC environments
- Parallelization and distributed processing architectures at the
middleware level
- Integration with cloud middleware and application servers
- Runtime environments and system level support for data-intensive computing
- Skeletons and patterns
- Checkpointing
- Programming models and languages
- Big Data ETL
- Stream processing middleware
- In-memory databases for HPC
- Scalability and interoperability
- Large-scale data storage and distributed file systems
- Content-centric addressing and networking
- Execution engines, languages and environments including CIEL/Skywriting
- Performance analysis, evaluation of data-intensive middleware
- In-depth analysis and performance optimizations in existing data-handling
middleware, focusing on indexing/fast storing or retrieval between compute
and storage nodes
- Highly scalable middleware optimized for minimum communication
- Use cases and experience for popular Big Data middleware
- Middleware security, privacy and trust architectures

DATES

Papers:
Rolling abstract submission
June 10, 2013 - Full paper submission (extended)
July 8, 2013 - Acceptance notification
October 3, 2013 - Camera-ready version due

Lightning Talks:
June 28, 2013 - Deadline for lightning talk abstracts
July 15, 2013 - Lightning talk notification

August 27, 2013 - Workshop Date


TPC

CHAIR

Michael Alexander (chair), TU Wien, Austria
Anastassios Nanos (co-chair), NTUA, Greece
Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany
Lizhe Wang (co-chair), Chinese Academy of Sciences, China
Gianluigi Zanetti (co-chair), CRS4, Italy

PROGRAM COMMITTEE

Amitanand Aiyer, Facebook, USA
Costas Bekas, IBM, Switzerland
Jakob Blomer, CERN, Switzerland
William Gardner, University of Guelph, Canada
José Gracia, HPC Center of the University of Stuttgart, Germany
Zhenghua Guom,  Indiana University, USA
Marcus Hardt,  Karlsruhe Institute of Technology, Germany
Sverre Jarp, CERN, Switzerland
Christopher Jung,  Karlsruhe Institute of Technology, Germany
Andreas Knüpfer - Technische Universität Dresden, Germany
Nectarios Koziris, National Technical University of Athens, Greece
Yan Ma, Chinese Academy of Sciences, China
Martin Schulz - Lawrence Livermore National Laboratory

LocalJobRunner is not using the correct JobConf to setup the OutputCommitter

2013-05-28 Thread Subroto
Hi,

I am reusing JobClient object which internally holds a LocalJobRunner instance.
When I submit the Job via the JobClient; LocalJobRunner is not using the 
correct JobConf to set the OutputCommitter.setupJob().

Following is the code snippet from 
LocalJobRunner#org.apache.hadoop.mapred.LocalJobRunner.Job.run():
public void run() {
  JobID jobId = profile.getJobID();
  JobContext jContext = new JobContext(conf, jobId);
  OutputCommitter outputCommitter = job.getOutputCommitter();
  try {
TaskSplitMetaInfo[] taskSplitMetaInfos =
  SplitMetaInfoReader.readSplitMetaInfo(jobId, localFs, conf, 
systemJobDir);
int numReduceTasks = job.getNumReduceTasks();
if (numReduceTasks  1 || numReduceTasks  0) {
  // we only allow 0 or 1 reducer in local mode
  numReduceTasks = 1;
  job.setNumReduceTasks(1);
}
outputCommitter.setupJob(jContext);
status.setSetupProgress(1.0f);
// Some more code to start map and reduce
}

The JobContext created in the second line of snippet is being created with the 
JobConf with which LocalJobRunner is instantiated; instead the JobContext 
should be created with JobConf with which the Job is instantiated.
Same context is being used to call outputcommitter.setupJob.

Please let me know if this is a bug or there is some specific intention behind 
this ??

Cheers,
Subroto Sanyal

signature.asc
Description: Message signed with OpenPGP using GPGMail


Pulling data from secured hadoop cluster to another hadoop cluster

2013-05-28 Thread samir das mohapatra
Hi All,
  I could able to connect the hadoop (source ) cluster after ssh is
established.

But i wanted to know, If I want to pull some data using distcp from source
secured hadoop box to another hadoop cluster , I could not able to ping
name node machine.  In this approach how to run distcp command from target
cluster in with secured connection.

Source: hadoop.server1 (ssh secured)
Target:  hadoop.server2 (runing distcp here)


running command:

distcp hftp://hadoop.server1:50070/dataSet
hdfs://hadoop.server2:54310/targetDataSet


Regards,
samir.


Re: Pulling data from secured hadoop cluster to another hadoop cluster

2013-05-28 Thread Nitin Pawar
hadoop daemons do not use ssh to communicate.

if  your distcp job could not connect to remote server then either the
connection was rejected by the target namenode or the it was not able to
establish the network connection.

were you able to see the hdfs on server1 from server2?


On Tue, May 28, 2013 at 5:17 PM, samir das mohapatra 
samir.help...@gmail.com wrote:

 Hi All,
   I could able to connect the hadoop (source ) cluster after ssh is
 established.

 But i wanted to know, If I want to pull some data using distcp from source
 secured hadoop box to another hadoop cluster , I could not able to ping
 name node machine.  In this approach how to run distcp command from target
 cluster in with secured connection.

 Source: hadoop.server1 (ssh secured)
 Target:  hadoop.server2 (runing distcp here)


 running command:

 distcp hftp://hadoop.server1:50070/dataSet
 hdfs://hadoop.server2:54310/targetDataSet


 Regards,
 samir.




-- 
Nitin Pawar


Fwd: Deadline Extension: 2013 Workshop on Middleware for HPC and Big Data Systems (MHPC'13)

2013-05-28 Thread MHPC 2013
we apologize if you receive multiple copies of this message


===


CALL FOR PAPERS


2013 Workshop on


Middleware for HPC and Big Data Systems


MHPC '13


as part of Euro-Par 2013, Aachen, Germany


===


Date: August 27, 2012


Workshop URL: http://m-hpc.org


Springer LNCS


SUBMISSION DEADLINE:


June 10, 2013 - LNCS Full paper submission (extended)

June 28, 2013 - Lightning Talk abstracts



SCOPE


Extremely large, diverse, and complex data sets are generated from

scientific applications, the Internet, social media and other applications.

Data may be physically distributed and shared by an ever larger community.

Collecting, aggregating, storing and analyzing large data volumes

presents major challenges. Processing such amounts of data efficiently

has been an issue to scientific discovery and technological

advancement. In addition, making the data accessible, understandable and

interoperable includes unsolved problems. Novel middleware architectures,

algorithms, and application development frameworks are required.


In this workshop we are particularly interested in original work at the

intersection of HPC and Big Data with regard to middleware handling

and optimizations. Scope is existing and proposed middleware for HPC

and big data, including analytics libraries and frameworks.


The goal of this workshop is to bring together software architects,

middleware and framework developers, data-intensive application developers

as well as users from the scientific and engineering community to exchange

their experience in processing large datasets and to report their scientific

achievement and innovative ideas. The workshop also offers a dedicated forum

for these researchers to access the state of the art, to discuss problems

and requirements, to identify gaps in current and planned designs, and to

collaborate in strategies for scalable data-intensive computing.


The workshop will be one day in length, composed of 20 min paper

presentations, each followed by 10 min discussion sections.

Presentations may be accompanied by interactive demonstrations.



TOPICS


Topics of interest include, but are not limited to:


- Middleware including: Hadoop, Apache Drill, YARN, Spark/Shark, Hive,
Pig, Sqoop,

HBase, HDFS, S4, CIEL, Oozie, Impala, Storm and Hyrack

- Data intensive middleware architecture

- Libraries/Frameworks including: Apache Mahout, Giraph, UIMA and GraphLab

- NG Databases including Apache Cassandra, MongoDB and CouchDB/Couchbase

- Schedulers including Cascading

- Middleware for optimized data locality/in-place data processing

- Data handling middleware for deployment in virtualized HPC environments

- Parallelization and distributed processing architectures at the
middleware level

- Integration with cloud middleware and application servers

- Runtime environments and system level support for data-intensive computing

- Skeletons and patterns

- Checkpointing

- Programming models and languages

- Big Data ETL

- Stream processing middleware

- In-memory databases for HPC

- Scalability and interoperability

- Large-scale data storage and distributed file systems

- Content-centric addressing and networking

- Execution engines, languages and environments including CIEL/Skywriting

- Performance analysis, evaluation of data-intensive middleware

- In-depth analysis and performance optimizations in existing data-handling

middleware, focusing on indexing/fast storing or retrieval between compute

and storage nodes

- Highly scalable middleware optimized for minimum communication

- Use cases and experience for popular Big Data middleware

- Middleware security, privacy and trust architectures


DATES


Papers:

Rolling abstract submission

June 10, 2013 - Full paper submission (extended)

July 8, 2013 - Acceptance notification

October 3, 2013 - Camera-ready version due


Lightning Talks:

June 28, 2013 - Deadline for lightning talk abstracts

July 15, 2013 - Lightning talk notification


August 27, 2013 - Workshop Date



TPC


CHAIR


Michael Alexander (chair), TU Wien, Austria

Anastassios Nanos (co-chair), NTUA, Greece

Jie Tao (co-chair), Karlsruhe Institut of Technology, Germany

Lizhe Wang (co-chair), Chinese Academy of Sciences, China

Gianluigi Zanetti (co-chair), CRS4, Italy


PROGRAM COMMITTEE


Amitanand Aiyer, Facebook, USA

Costas Bekas, IBM, Switzerland

Jakob Blomer, CERN, Switzerland

William Gardner, University of Guelph, Canada

José Gracia, HPC Center of the University of Stuttgart, Germany

Zhenghua Guom,  Indiana University, USA

Marcus Hardt,  Karlsruhe Institute of Technology, Germany

Sverre Jarp, CERN, Switzerland

Christopher Jung,  Karlsruhe Institute of Technology, Germany

Andreas Knüpfer - Technische Universität Dresden, Germany

Nectarios Koziris, National Technical University of Athens, Greece


debugging of map reduce tasks

2013-05-28 Thread Günter Hipler

Hi

sorry, just a beginners question...

I'm trying to debug a simple Map/Reduce task (MaxTemperature example in 
Hadoop The definite guide)


My understanding: within a standalone distribution it should be possible 
to debug even the Map and Reduce tasks.


I found some descriptions for Eclipse as well as for Intellij
http://www.thecloudavenue.com/2012/10/debugging-hadoop-mapreduce-program-in.html
http://vichargrave.com/debugging-hadoop-applications-with-intellij/
http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/

Personally I'm working with intellij

I can start the task, I get a correct result but the process doesn't 
stop in the Mapper or Reducer even with breakpoints. I use the old and 
new API


Thanks for some hints!

Günter



--
Universität Basel
Universitätsbibliothek
Günter Hipler
Projekt SwissBib
Schoenbeinstrasse 18-20
4056 Basel, Schweiz
Tel.: + 41 (0)61 267 31 12 Fax: ++41 61 267 3103
E-Mail guenter.hip...@unibas.ch
URL: www.swissbib.org  / http://www.ub.unibas.ch/



Re: Pulling data from secured hadoop cluster to another hadoop cluster

2013-05-28 Thread Shahab Yunus
Also Samir, when you say 'secured', by any chance that cluster is secured
with Kerberos (rather than ssh)?

-Shahab


On Tue, May 28, 2013 at 8:29 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 hadoop daemons do not use ssh to communicate.

 if  your distcp job could not connect to remote server then either the
 connection was rejected by the target namenode or the it was not able to
 establish the network connection.

 were you able to see the hdfs on server1 from server2?


 On Tue, May 28, 2013 at 5:17 PM, samir das mohapatra 
 samir.help...@gmail.com wrote:

 Hi All,
   I could able to connect the hadoop (source ) cluster after ssh is
 established.

 But i wanted to know, If I want to pull some data using distcp from
 source secured hadoop box to another hadoop cluster , I could not able to
 ping  name node machine.  In this approach how to run distcp command from
 target cluster in with secured connection.

 Source: hadoop.server1 (ssh secured)
 Target:  hadoop.server2 (runing distcp here)


 running command:

 distcp hftp://hadoop.server1:50070/dataSet
 hdfs://hadoop.server2:54310/targetDataSet


 Regards,
 samir.




 --
 Nitin Pawar



Re: Child Error

2013-05-28 Thread Jean-Marc Spaggiari
That's strange.

So now each time you are running it it's railing with the exception below?
Or it's sometime working, sometime failing?

also, can you clear you tmp directory and make sure you have enough space
it it before you retry?

JM

2013/5/27 Jim Twensky jim.twen...@gmail.com

 Hi Jean,

 I switched to Oracle JDK 1.6 as you suggested and ran a job successfully
 this afternoon which lasted for about 3 hours - this job was producing
 errors with open JDK normally. I then stopped the cluster, re-started it
 again and tried running the same job but I got the same failure to log'in
 error using the Oracle JDK. This is really weird and unusual. I am pasting
 the stack trace below. It occurred in 3 different nodes out of 20. Any
 suggestions?





 Exception in thread main java.io.IOException: Exception reading
 file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305262239_0002/jobToken

 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135)
 at
 org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)
 at org.apache.hadoop.mapred.Child.main(Child.java:92)
 Caused by: java.io.IOException: failure to login
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501)
 at
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463)
 at
 org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1519)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129)
 ... 2 more
 Caused by: javax.security.auth.login.LoginException:
 java.lang.NullPointerException: invalid null input: name
 at com.sun.security.auth.UnixPrincipal.init(UnixPrincipal.java:53)
 at
 com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:114)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at
 javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)

 at
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463)
 at
 org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1519)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129)
 at
 org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)
 at org.apache.hadoop.mapred.Child.main(Child.java:92)

 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:872)
 at
 javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)


 On Sat, May 25, 2013 at 12:14 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Jim,

 Will you be able to do the same test with Oracle JDK 1.6 instead of
 OpenJDK 1.7 to see if it maked a difference?

 JM


 2013/5/25 Jim Twensky jim.twen...@gmail.com

 Hi Jean, thanks for replying. I'm using java 1.7.0_21 on ubuntu. Here is
 the output:

 $ java -version
 java version 1.7.0_21
 OpenJDK Runtime Environment (IcedTea 2.3.9) (7u21-2.3.9-0ubuntu0.12.10.1)
 OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)

 I don't get any OOME errors and this error happens on random nodes, not
 a particular one. Usually all tasks running on a particular node fail and
 that node gets blacklisted. However, the same node works just fine during
 the next or previous jobs. Can it be a problem with the ssh keys? What else
 can cause the IOException with 

Re: Pulling data from secured hadoop cluster to another hadoop cluster

2013-05-28 Thread samir das mohapatra
it is not hadoop security issue, the security is in host , I Mean to say in
network level.

 I could not able to ping bcz source system is designed such a way that
only you can connecto through ssh .
If this is the case how to over come this problem.

What extra parameter i need to add i ssh level so that i could able to ping
the machine. All the servers are in same domain.




On Tue, May 28, 2013 at 7:35 PM, Shahab Yunus shahab.yu...@gmail.comwrote:

 Also Samir, when you say 'secured', by any chance that cluster is secured
 with Kerberos (rather than ssh)?

 -Shahab


 On Tue, May 28, 2013 at 8:29 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 hadoop daemons do not use ssh to communicate.

 if  your distcp job could not connect to remote server then either the
 connection was rejected by the target namenode or the it was not able to
 establish the network connection.

 were you able to see the hdfs on server1 from server2?


 On Tue, May 28, 2013 at 5:17 PM, samir das mohapatra 
 samir.help...@gmail.com wrote:

 Hi All,
   I could able to connect the hadoop (source ) cluster after ssh is
 established.

 But i wanted to know, If I want to pull some data using distcp from
 source secured hadoop box to another hadoop cluster , I could not able to
 ping  name node machine.  In this approach how to run distcp command from
 target cluster in with secured connection.

 Source: hadoop.server1 (ssh secured)
 Target:  hadoop.server2 (runing distcp here)


 running command:

 distcp hftp://hadoop.server1:50070/dataSet
 hdfs://hadoop.server2:54310/targetDataSet


 Regards,
 samir.




 --
 Nitin Pawar





Re: Child Error

2013-05-28 Thread Jim Twensky
It sometimes works and sometimes fails. Usually, if my tasks take more than
an hour, at least one node fails and gets blacklisted. I have seen at most
five nodes do this.

I cleared my tmp directory, tried using /var/tmp for dfs and mapred storage
and I have over 300 GB free space in all nodes. Is there another log or
maybe a system log file that I can look at to see the root cause of this
issue?


On Tue, May 28, 2013 at 9:24 AM, Jean-Marc Spaggiari 
jean-m...@spaggiari.org wrote:

 That's strange.

 So now each time you are running it it's railing with the exception below?
 Or it's sometime working, sometime failing?

 also, can you clear you tmp directory and make sure you have enough space
 it it before you retry?

 JM


 2013/5/27 Jim Twensky jim.twen...@gmail.com

 Hi Jean,

 I switched to Oracle JDK 1.6 as you suggested and ran a job successfully
 this afternoon which lasted for about 3 hours - this job was producing
 errors with open JDK normally. I then stopped the cluster, re-started it
 again and tried running the same job but I got the same failure to log'in
 error using the Oracle JDK. This is really weird and unusual. I am pasting
 the stack trace below. It occurred in 3 different nodes out of 20. Any
 suggestions?





 Exception in thread main java.io.IOException: Exception reading
 file:/var/tmp/jim/hadoop-jim/mapred/local/taskTracker/jim/jobcache/job_201305262239_0002/jobToken

 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:135)
 at
 org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)
 at org.apache.hadoop.mapred.Child.main(Child.java:92)
 Caused by: java.io.IOException: failure to login
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:501)
 at
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463)
 at
 org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1519)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129)
 ... 2 more
 Caused by: javax.security.auth.login.LoginException:
 java.lang.NullPointerException: invalid null input: name
 at com.sun.security.auth.UnixPrincipal.init(UnixPrincipal.java:53)
 at
 com.sun.security.auth.module.UnixLoginModule.login(UnixLoginModule.java:114)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 javax.security.auth.login.LoginContext.invoke(LoginContext.java:769)
 at
 javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)

 at
 org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:463)
 at
 org.apache.hadoop.fs.FileSystem$Cache$Key.init(FileSystem.java:1519)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1420)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
 at
 org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:129)
 at
 org.apache.hadoop.mapreduce.security.TokenCache.loadTokens(TokenCache.java:165)
 at org.apache.hadoop.mapred.Child.main(Child.java:92)

 at
 javax.security.auth.login.LoginContext.invoke(LoginContext.java:872)
 at
 javax.security.auth.login.LoginContext.access$000(LoginContext.java:186)
 at javax.security.auth.login.LoginContext$5.run(LoginContext.java:706)
 at java.security.AccessController.doPrivileged(Native Method)
 at
 javax.security.auth.login.LoginContext.invokeCreatorPriv(LoginContext.java:703)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:575)
 at
 org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:482)


 On Sat, May 25, 2013 at 12:14 PM, Jean-Marc Spaggiari 
 jean-m...@spaggiari.org wrote:

 Hi Jim,

 Will you be able to do the same test with Oracle JDK 1.6 instead of
 OpenJDK 1.7 to see if it maked a difference?

 JM


 2013/5/25 Jim Twensky jim.twen...@gmail.com

 Hi Jean, thanks for replying. I'm using java 1.7.0_21 on ubuntu. Here
 is the output:

 $ java 

issue launching mapreduce job with kerberos secured hadoop

2013-05-28 Thread Neeraj Chaplot
Hi All,

When hadoop started with Kerberos authentication hadoop fs commands work
well but MR job fails.

Simple wordcount program fails at reducer stage giving follwoing exception :
013-05-28 17:43:58,896 WARN org.apache.hadoop.mapred.
ReduceTask: attempt_201305281729_0003_r_00_1 copy failed:
attempt_201305281729_0003_m_00_0 from 192.168.49.51
2013-05-28 17:43:58,897 WARN org.apache.hadoop.mapred.ReduceTask:
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
at sun.net.www.http.HttpClient.init(HttpClient.java:233)
at sun.net.www.http.HttpClient.New(HttpClient.java:306)
at sun.net.www.http.HttpClient.New(HttpClient.java:323)
at
sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
at
sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
at
sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394)
at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

Please provide some inputs to fix the issue.

Thanks


Not saving any output

2013-05-28 Thread jamal sasha
Hi,
  I want to process some text files and then save the output in a db.
I am using python (hadoop streaming).
I am using mongo as backend server.
Is it possible to run hadoop streaming jobs without specifying any output?
What is the best way to deal with this.


Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-28 Thread Shahab Yunus
Have you verified that the kerberos settings are configured properly in
mapred-site.xml too just as in hdfs-site.xml (assuming you are using MRv1)?

-Shahab


On Tue, May 28, 2013 at 9:06 AM, Neeraj Chaplot geek...@gmail.com wrote:

 Hi All,

 When hadoop started with Kerberos authentication hadoop fs commands work
 well but MR job fails.

 Simple wordcount program fails at reducer stage giving follwoing exception
 :
 013-05-28 17:43:58,896 WARN org.apache.hadoop.mapred.
 ReduceTask: attempt_201305281729_0003_r_00_1 copy failed:
 attempt_201305281729_0003_m_00_0 from 192.168.49.51
 2013-05-28 17:43:58,897 WARN org.apache.hadoop.mapred.ReduceTask:
 java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
 at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
 at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 at java.net.Socket.connect(Socket.java:529)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
 at sun.net.www.http.HttpClient.init(HttpClient.java:233)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at sun.net.www.http.HttpClient.New(HttpClient.java:323)
 at
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
 at
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
 at
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

 Please provide some inputs to fix the issue.

 Thanks




Re: Not saving any output

2013-05-28 Thread Kai Voigt
You can have your python streaming script simply not write any key/value pairs 
to stdout, so you'll get an empty job output.

Independently, your script could do anything external, such as connecting to a 
remote database and store data in those. You probably want to avoid too many 
tasks doing this in parallel.

But more common would be a regular job which writes data to HDFS, and then use 
Sqoop to store that data into a RDBMS. But it's your choice.

Kai

Am 28.05.2013 um 20:57 schrieb jamal sasha jamalsha...@gmail.com:

 Hi,
   I want to process some text files and then save the output in a db.
 I am using python (hadoop streaming).
 I am using mongo as backend server.
 Is it possible to run hadoop streaming jobs without specifying any output?
 What is the best way to deal with this.
 

-- 
Kai Voigt
k...@123.org






[no subject]

2013-05-28 Thread Kabjin Kwon



Re: Apache Flume Properties File

2013-05-28 Thread Connor Woodson
Here is a good guide on getting started with Flume; about a third of the
way down is a very basic configuration you can copy+paste to test it out.

https://cwiki.apache.org/FLUME/getting-started.html

- Connor


On Fri, May 24, 2013 at 2:13 PM, Raj Hadoop hadoop...@yahoo.com wrote:

 Hi,

 I just installed Apache Flume 1.3.1 and trying to run a small example to
 test. Can any one suggest me how can I do this? I am going through the
 documentation right now.

 Thanks,
 Raj



Pease subscribe me for the mailing list

2013-05-28 Thread kasi rama


Thanks and Regards

Kasi

Unsubscribe

2013-05-28 Thread prithvi dammalapati
Unsubscribe


Introducing Weave - Apache YARN simplified.

2013-05-28 Thread Nitin Motgi
Hi All,

At Continuuity, we use Apache YARN as an integral part of our products because 
of its vision to support a diverse set of applications and processing patterns. 
One such product is BigFlow, our realtime distributed stream-processing engine. 
Apache YARN is used to run and manage BigFlow applications including lifecycle 
and runtime elasticity. During our journey with YARN we have come to the 
realization that it is extremely powerful but its full capability is 
challenging to leverage.  It is difficult to get started, hard to test and 
debug, and complex to build new kinds of non-MapReduce applications and 
frameworks on. 

We decided to build Weave and be part of the journey to take Apache YARN to the 
next level of usability and functionality. We have been using Weave extensively 
to support our products and have seen the benefit and power of Apache YARN and 
Weave combined.  We have decided to share Weave under the Apache 2.0 license in 
an effort to collaborate with members of the community, broaden the set of 
applications and patterns that Weave supports, and further the overall adoption 
of Apache YARN. 

Weave is NOT a replacement for Apache YARN.  It is instead a value-added 
framework that operates on top of Apache YARN. 

What is Weave ?
==

Weave is a simple set of libraries that allows you to easily manage distributed 
applications through an abstraction layer built on Apache YARN. Weave allows 
you to use YARN’s distributed capabilities with a programming model that is 
similar to running threads. 

Features of Weave
===

  - Simple API for specifying, running and managing application lifecycle
  - An easy way to communicate with an application or parts of an application
  - A generic Application Master to better support simple applications
  - Simplified archive management and local file transport
  - Improved control over application logs, metrics and errors
  - Discovery service
  - And many more...

Where can I get this coolness ?
=

The code is available on github at http://github.com/continuuity/weave under 
the Apache 2.0 License.

We will continue adding features to improve Weave, but would love to hear from 
people who are willing to be contributors to this project and help us make it 
better.  If you are not interested in contributing - no problem - we would  
still love to hear your comments, questions, and concerns.

Thanks for your time and we look forward to hearing your thoughts on Weave.

The Continuuity Team
o...@continuuity.com

NM/AM interaction

2013-05-28 Thread John Lilley
I was reading from the HortonWorks blog:
How MapReduce shuffle takes advantage of NM's Auxiliary-services
The Shuffle functionality required to run a MapReduce (MR) application is 
implemented as an Auxiliary Service. This service starts up a Netty Web Server, 
and knows how to handle MR specific shuffle requests from Reduce tasks. The MR 
AM specifies the service id for the shuffle service, along with security tokens 
that may be required. The NM provides the AM with the port on which the shuffle 
service is running which is passed onto the Reduce tasks.
How does the AM get the service ID and the port?
Thanks,
John



Re: NM/AM interaction

2013-05-28 Thread Harsh J
All AuxiliaryServices are configured in yarn-site.xml with a service
ID and a class is associated to the defined service ID. For MR2, one
generally adds the two below properties, first defining the name
(Service ID), second defining the class to launch for the defined
name:

nameyarn.nodemanager.aux-services/name
valuemapreduce.shuffle/value

nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value

As part of the interface, all AuxiliaryServices may submit back some
metadata that they would require clients to be aware of. For the
ShuffleHandler, the port is rather important, so it serializes it via
the getMeta() interface. [1]

As part of any startContainer(…) response from the NodeManager's
ContainerManager service, all metadata of all available auxiliary
services are shipped back as part of a successful response to a
container start request. This is a mapping based on (configured
service ID - metadata) for every Aux service configured and currently
running. [2]

A client, such as MR2, receives this batch of metadata and
deserializes whatever it is looking for [3] using the service ID name
string it is aware of [4].

[1] - 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/AuxServices.java#L195
and 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L328
[2] - 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/main/java/org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java#L502
[3] - 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/launcher/ContainerLauncherImpl.java#L165
[4] - 
https://github.com/apache/hadoop-common/blob/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java#L148

On Wed, May 29, 2013 at 4:13 AM, John Lilley john.lil...@redpoint.net wrote:
 I was reading from the HortonWorks blog:

 “How MapReduce shuffle takes advantage of NM’s Auxiliary-services

 The Shuffle functionality required to run a MapReduce (MR) application is
 implemented as an Auxiliary Service. This service starts up a Netty Web
 Server, and knows how to handle MR specific shuffle requests from Reduce
 tasks. The MR AM specifies the service id for the shuffle service, along
 with security tokens that may be required. The NM provides the AM with the
 port on which the shuffle service is running which is passed onto the Reduce
 tasks.”

 How does the AM get the service ID and the port?

 Thanks,

 John





--
Harsh J


Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-28 Thread Rahul Bhattacharjee
The error looks a little low level , network level . The http server for
some reason couldn't bind to the port.
Might have nothing to do with Kerborose.

Thanks,
Rahul


On Tue, May 28, 2013 at 6:36 PM, Neeraj Chaplot geek...@gmail.com wrote:

 Hi All,

 When hadoop started with Kerberos authentication hadoop fs commands work
 well but MR job fails.

 Simple wordcount program fails at reducer stage giving follwoing exception
 :
 013-05-28 17:43:58,896 WARN org.apache.hadoop.mapred.
 ReduceTask: attempt_201305281729_0003_r_00_1 copy failed:
 attempt_201305281729_0003_m_00_0 from 192.168.49.51
 2013-05-28 17:43:58,897 WARN org.apache.hadoop.mapred.ReduceTask:
 java.net.ConnectException: Connection refused
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
 at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
 at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
 at java.net.Socket.connect(Socket.java:529)
 at sun.net.NetworkClient.doConnect(NetworkClient.java:158)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:394)
 at sun.net.www.http.HttpClient.openServer(HttpClient.java:529)
 at sun.net.www.http.HttpClient.init(HttpClient.java:233)
 at sun.net.www.http.HttpClient.New(HttpClient.java:306)
 at sun.net.www.http.HttpClient.New(HttpClient.java:323)
 at
 sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:970)
 at
 sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:911)
 at
 sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:836)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1618)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.setupSecureConnection(ReduceTask.java:1575)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1483)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1394)
 at
 org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1326)

 Please provide some inputs to fix the issue.

 Thanks