Re: Accessing HDFS files from an servlet

2012-04-15 Thread sushil sontakke
i am getting this error on running that jsp as servlet






Apr 15, 2012 2:34:07 PM org.apache.catalina.core.AprLifecycleListener init
INFO: Loaded APR based Apache Tomcat Native library 1.1.22.
Apr 15, 2012 2:34:07 PM org.apache.catalina.core.AprLifecycleListener init
INFO: APR capabilities: IPv6 [false], sendfile [true], accept filters
[false], random [true].
Apr 15, 2012 2:34:07 PM org.apache.tomcat.util.digester.SetPropertiesRule
begin
WARNING: [SetPropertiesRule]{Server/Service/Engine/Host/Context} Setting
property 'source' to 'org.eclipse.jst.j2ee.server:UserInterface' did not
find a matching property.
Apr 15, 2012 2:34:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-apr-8080]
Apr 15, 2012 2:34:08 PM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [ajp-apr-8009]
Apr 15, 2012 2:34:08 PM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1985 ms
Apr 15, 2012 2:34:08 PM org.apache.catalina.core.StandardService
startInternal
INFO: Starting service Catalina
Apr 15, 2012 2:34:08 PM org.apache.catalina.core.StandardEngine
startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.26
Apr 15, 2012 2:34:09 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [http-apr-8080]
Apr 15, 2012 2:34:09 PM org.apache.coyote.AbstractProtocol start
INFO: Starting ProtocolHandler [ajp-apr-8009]
Apr 15, 2012 2:34:09 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 846 ms
Apr 15, 2012 2:34:10 PM org.apache.catalina.core.ApplicationContext log
INFO: Marking servlet DisplayFile as unavailable
Apr 15, 2012 2:34:10 PM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Allocate exception for servlet DisplayFile
java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1701)
at
org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1546)
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Unknown Source)
at java.lang.Class.getConstructor0(Unknown Source)
at java.lang.Class.newInstance0(Unknown Source)
at java.lang.Class.newInstance(Unknown Source)
at
org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:125)
at
org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1136)
at
org.apache.catalina.core.StandardWrapper.allocate(StandardWrapper.java:857)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:135)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.AprEndpoint$SocketProcessor.run(AprEndpoint.java:1805)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)


Re: Issue with loading the Snappy Codec

2012-04-15 Thread Bas Hickendorff
Thanks.

The native snappy libraries I have installed. However, I use the
normal jars that you get when downloading Hadoop, I am not compiling
Hadoop myself.

I do not want to use the snappy codec (I don't care about compression
at the moment), but it seems it is needed anyway? I added this to the
mapred-site.xml:

property
namemapred.compress.map.output/name
valuefalse/value
/property

But it still fails with the error of my previous email (SnappyCodec not found).

Regards,

Bas


On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:

 Hadoop has integrated snappy via installed native libraries instead of 
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
  - You need to have the snappy system libraries (snappy and snappy-devel) 
 installed before you compile hadoop. (RPMs are available on the web, 
 http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
  - When you build hadoop, you will need to compile the native libraries(by 
 passing -Dcompile.native=true to ant) to avail snappy support.
  - You also need to make sure that snappy system library is available on the 
 library path for all mapreduce tasks at runtime. Usually if you install them 
 on /usr/lib or /usr/local/lib, it should work.

 HTH,
 +Vinod

 On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:

 Hello,

 When I start a map-reduce job, it starts, and after a short while,
 fails with the error below (SnappyCodec not found).

 I am currently starting the job from other Java code (so the Hadoop
 executable in the bin directory is not used anymore), but in principle
 this seems to work (in the admin of the Jobtracker the job shows up
 when it starts). However after a short while the map task fails with:


 java.lang.IllegalArgumentException: Compression codec
 org.apache.hadoop.io.compress.SnappyCodec not found.
       at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
       at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
       at 
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
       at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:416)
       at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
       at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.compress.SnappyCodec
       at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
       at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
       at java.lang.Class.forName0(Native Method)
       at java.lang.Class.forName(Class.java:264)
       at 
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
       at 
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
       ... 10 more


 I confirmed that the SnappyCodec class is present in the
 hadoop-core-1.0.2.jar, and the snappy-java-1.0.4.1.jar is present as
 well. The directory of those jars is on the HADOOP_CLASSPATH, but it
 seems it still cannot find it. I also checked that the config files of
 Hadoop are read. I run all nodes on localhost.

 Any suggestions on what could be the cause of the issue?

 Regards,

 Bas



Re: Issue with loading the Snappy Codec

2012-04-15 Thread john smith
Can you restart tasktrackers once and run the job again? It refreshes the
class path.

On Sun, Apr 15, 2012 at 11:58 AM, Bas Hickendorff
hickendorff...@gmail.comwrote:

 Thanks.

 The native snappy libraries I have installed. However, I use the
 normal jars that you get when downloading Hadoop, I am not compiling
 Hadoop myself.

 I do not want to use the snappy codec (I don't care about compression
 at the moment), but it seems it is needed anyway? I added this to the
 mapred-site.xml:

 property
namemapred.compress.map.output/name
valuefalse/value
 /property

 But it still fails with the error of my previous email (SnappyCodec not
 found).

 Regards,

 Bas


 On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 
  Hadoop has integrated snappy via installed native libraries instead of
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
   - You need to have the snappy system libraries (snappy and
 snappy-devel) installed before you compile hadoop. (RPMs are available on
 the web, http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
   - When you build hadoop, you will need to compile the native
 libraries(by passing -Dcompile.native=true to ant) to avail snappy support.
   - You also need to make sure that snappy system library is available on
 the library path for all mapreduce tasks at runtime. Usually if you install
 them on /usr/lib or /usr/local/lib, it should work.
 
  HTH,
  +Vinod
 
  On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:
 
  Hello,
 
  When I start a map-reduce job, it starts, and after a short while,
  fails with the error below (SnappyCodec not found).
 
  I am currently starting the job from other Java code (so the Hadoop
  executable in the bin directory is not used anymore), but in principle
  this seems to work (in the admin of the Jobtracker the job shows up
  when it starts). However after a short while the map task fails with:
 
 
  java.lang.IllegalArgumentException: Compression codec
  org.apache.hadoop.io.compress.SnappyCodec not found.
at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
at
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
  Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.io.compress.SnappyCodec
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
... 10 more
 
 
  I confirmed that the SnappyCodec class is present in the
  hadoop-core-1.0.2.jar, and the snappy-java-1.0.4.1.jar is present as
  well. The directory of those jars is on the HADOOP_CLASSPATH, but it
  seems it still cannot find it. I also checked that the config files of
  Hadoop are read. I run all nodes on localhost.
 
  Any suggestions on what could be the cause of the issue?
 
  Regards,
 
  Bas
 



Re: Issue with loading the Snappy Codec

2012-04-15 Thread Bas Hickendorff
Hello John,

I did restart them (in fact, I did a full reboot of the machine). The
error is still there.

I guess my question is: is it expected that Hadoop needs to do
something with the Snappycodec when mapred.compress.map.output is set
to false?

Regards,

Bas

On Sun, Apr 15, 2012 at 12:04 PM, john smith js1987.sm...@gmail.com wrote:
 Can you restart tasktrackers once and run the job again? It refreshes the
 class path.

 On Sun, Apr 15, 2012 at 11:58 AM, Bas Hickendorff
 hickendorff...@gmail.comwrote:

 Thanks.

 The native snappy libraries I have installed. However, I use the
 normal jars that you get when downloading Hadoop, I am not compiling
 Hadoop myself.

 I do not want to use the snappy codec (I don't care about compression
 at the moment), but it seems it is needed anyway? I added this to the
 mapred-site.xml:

 property
        namemapred.compress.map.output/name
        valuefalse/value
 /property

 But it still fails with the error of my previous email (SnappyCodec not
 found).

 Regards,

 Bas


 On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 
  Hadoop has integrated snappy via installed native libraries instead of
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
   - You need to have the snappy system libraries (snappy and
 snappy-devel) installed before you compile hadoop. (RPMs are available on
 the web, http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
   - When you build hadoop, you will need to compile the native
 libraries(by passing -Dcompile.native=true to ant) to avail snappy support.
   - You also need to make sure that snappy system library is available on
 the library path for all mapreduce tasks at runtime. Usually if you install
 them on /usr/lib or /usr/local/lib, it should work.
 
  HTH,
  +Vinod
 
  On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:
 
  Hello,
 
  When I start a map-reduce job, it starts, and after a short while,
  fails with the error below (SnappyCodec not found).
 
  I am currently starting the job from other Java code (so the Hadoop
  executable in the bin directory is not used anymore), but in principle
  this seems to work (in the admin of the Jobtracker the job shows up
  when it starts). However after a short while the map task fails with:
 
 
  java.lang.IllegalArgumentException: Compression codec
  org.apache.hadoop.io.compress.SnappyCodec not found.
        at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
        at
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
        at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
        at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)
  Caused by: java.lang.ClassNotFoundException:
  org.apache.hadoop.io.compress.SnappyCodec
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:264)
        at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
        at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
        ... 10 more
 
 
  I confirmed that the SnappyCodec class is present in the
  hadoop-core-1.0.2.jar, and the snappy-java-1.0.4.1.jar is present as
  well. The directory of those jars is on the HADOOP_CLASSPATH, but it
  seems it still cannot find it. I also checked that the config files of
  Hadoop are read. I run all nodes on localhost.
 
  Any suggestions on what could be the cause of the issue?
 
  Regards,
 
  Bas
 



hadoop namenode -format : at boot time?

2012-04-15 Thread biro lehel
Dear all,

I was wondering if it is possible to format the HDFS at boot time. I have some 
VM's that are pre-set and pre-configured with Hadoop (datanodes [slaves] and a 
namenode [master]), and I'm looking for a way to obtain a cluster from them 
out of the box, as they're launched (including the namenode).

I currently have the following boot-time init script on my namenode:

#!/bin/bash
#
#
# Starts a Hadoop Master
#
# chkconfig: 2345 90 10
# description: Hadoop master
 
. /etc/rc.status
. /home/oneadmin/mountpoint/hadoop-0.20.2/conf/hadoop-env.sh
export HPATH=/home/oneadmin/mountpoint/hadoop-0.20.2
export HLOCK=/tmp

RETVAL=0
PIDFILE=$HLOCK/hadoop-hdfs-master.pid
desc=Hadoop Master daemon
 
start() {
  echo -n $Starting $desc (hadoop): 
  /sbin/startproc -u 1001 $HPATH/bin/hadoop namenode -format
  /sbin/startproc -u 1001 $HPATH/bin/start-dfs.sh $1
  /sbin/startproc -u 1001 $HPATH/bin/start-mapred.sh $1
  RETVAL=$?
  echo
  [ $RETVAL -eq 0 ]  touch $HLOCK/hadoop-master
  return $RETVAL
}
 
stop() {
  echo -n $Stopping $desc (hadoop): 
  /sbin/startproc -u 1001 $HPATH/bin/stop-all.sh
  RETVAL=$?
  sleep 5
  echo
  [ $RETVAL -eq 0 ]  rm -f $HLOCK/hadoop-master $PIDFILE
}
 
checkstatus(){
  jps |grep NameNode
}
 
restart() {
  stop
  start
}
 
format() {
  /sbin/startproc -u 1001 $HPATH/bin/hadoop master -format
}
 
case $1 in
  start)
    start
    ;;
  upgrade)
    upgrade
    ;;
  format)
    format
    ;;
  stop)
    stop
    ;;
  status)
    checkstatus
    ;;
  restart)
    restart
    ;;
  *)
    echo $Usage: $0 {start|stop|status|restart|try-restart}
    exit 1
esac
 
exit $RETVAL 


As you can see, I included the format command in the starting function, too, 
however it is not working. All the Hadoop processes except NameNode start, and 
the HDFS isn't being formatted.

Is it possible to obtain such functionality that I'm looking for? Any 
suggestions would be highly appreciated.

Thank you,
Lehel Biro.


Re: Issue with loading the Snappy Codec

2012-04-15 Thread JAX
That is odd why would it crash when your m/r job did not rely on snappy? 

One possibility : Maybe because your input is snappy compressed, Hadoop is 
detecting that compression, and trying to use the snappy codec to decompress.?

Jay Vyas 
MMSB
UCHC

On Apr 15, 2012, at 5:08 AM, Bas Hickendorff hickendorff...@gmail.com wrote:

 Hello John,
 
 I did restart them (in fact, I did a full reboot of the machine). The
 error is still there.
 
 I guess my question is: is it expected that Hadoop needs to do
 something with the Snappycodec when mapred.compress.map.output is set
 to false?
 
 Regards,
 
 Bas
 
 On Sun, Apr 15, 2012 at 12:04 PM, john smith js1987.sm...@gmail.com wrote:
 Can you restart tasktrackers once and run the job again? It refreshes the
 class path.
 
 On Sun, Apr 15, 2012 at 11:58 AM, Bas Hickendorff
 hickendorff...@gmail.comwrote:
 
 Thanks.
 
 The native snappy libraries I have installed. However, I use the
 normal jars that you get when downloading Hadoop, I am not compiling
 Hadoop myself.
 
 I do not want to use the snappy codec (I don't care about compression
 at the moment), but it seems it is needed anyway? I added this to the
 mapred-site.xml:
 
 property
namemapred.compress.map.output/name
valuefalse/value
 /property
 
 But it still fails with the error of my previous email (SnappyCodec not
 found).
 
 Regards,
 
 Bas
 
 
 On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 
 Hadoop has integrated snappy via installed native libraries instead of
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
  - You need to have the snappy system libraries (snappy and
 snappy-devel) installed before you compile hadoop. (RPMs are available on
 the web, http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
  - When you build hadoop, you will need to compile the native
 libraries(by passing -Dcompile.native=true to ant) to avail snappy support.
  - You also need to make sure that snappy system library is available on
 the library path for all mapreduce tasks at runtime. Usually if you install
 them on /usr/lib or /usr/local/lib, it should work.
 
 HTH,
 +Vinod
 
 On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:
 
 Hello,
 
 When I start a map-reduce job, it starts, and after a short while,
 fails with the error below (SnappyCodec not found).
 
 I am currently starting the job from other Java code (so the Hadoop
 executable in the bin directory is not used anymore), but in principle
 this seems to work (in the admin of the Jobtracker the job shows up
 when it starts). However after a short while the map task fails with:
 
 
 java.lang.IllegalArgumentException: Compression codec
 org.apache.hadoop.io.compress.SnappyCodec not found.
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
   at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
   at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.compress.SnappyCodec
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
   ... 10 more
 
 
 I confirmed that the SnappyCodec class is present in the
 hadoop-core-1.0.2.jar, and the snappy-java-1.0.4.1.jar is present as
 well. The directory of those jars is on the HADOOP_CLASSPATH, but it
 seems it still cannot find it. I also checked that the config files of
 Hadoop are read. I run all nodes on localhost.
 
 Any suggestions on what could be the cause of the issue?
 
 Regards,
 
 Bas
 
 


Re: getting UnknownHostException

2012-04-15 Thread Sujit Dhamale
hi Madhu,
After doing Modification in /ets/host it's working fine

Thanks a lot :)

Kind Regards
Sijit Dhamale
(+91 9970086652)

On Fri, Apr 13, 2012 at 10:49 AM, madhu phatak phatak@gmail.com wrote:

 Please check contents of /etc/hosts for the hostname and ipaddress mapping.

 On Thu, Apr 12, 2012 at 11:11 PM, Sujit Dhamale sujitdhamal...@gmail.com
 wrote:

  Hi Friends ,
  i am getting UnknownHostException while executing Hadoop Word count
 program
 
  getting below details from job tracker Web page
 
  *User:* sujit
  *Job Name:* word count
  *Job File:*
 
 
 hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/sujit/.staging/job_201204112234_0002/job.xml
  http://localhost:50030/jobconf.jsp?jobid=job_201204112234_0002
  *Submit Host:* sujit.(null)
  *Submit Host Address:* 127.0.1.1
  *Job-ACLs: All users are allowed*
  *Job Setup:*None
  *Status:* Failed
  *Failure Info:*Job initialization failed: java.net.UnknownHostException:
  sujit.(null) is not a valid Inet address at org.apache.hadoop.net.
  NetUtils.verifyHostnames(NetUtils.java:569) at
  org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:711)
 at
  org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:4207) at
 
 
 org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
  at
 
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
  at java.lang.Thread.run(Thread.java:662)
  *Started at:* Wed Apr 11 22:36:46 IST 2012
  *Failed at:* Wed Apr 11 22:36:47 IST 2012
  *Failed in:* 0sec
  *Job Cleanup:*None
 
 
 
 
  Can some one help me how to resolve this issue .
  i tried with : http://wiki.apache.org/hadoop/UnknownHost
 
  but still not able to resolve issue ,
  please help me out .
 
 
  Hadoop Version: hadoop-1.0.1.tar.gz
  java version 1.6.0_30
  Operating System : Ubuntu 11.10
 
 
  *Note *: All node were up before starting execution of Program
 
  Kind Regards
  Sujit Dhamale
  http://wiki.apache.org/hadoop/UnknownHost
 



 --
 https://github.com/zinnia-phatak-dev/Nectar



Re: Issue with loading the Snappy Codec

2012-04-15 Thread Edward Capriolo
You need three things. 1 install snappy to a place the system can pick
it out automatically or add it to your java.library.path

Then add the full name of the codec to io.compression.codecs.

hive set io.compression.codecs;
io.compression.codecs=org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec

Edward


On Sun, Apr 15, 2012 at 8:36 AM, Bas Hickendorff
hickendorff...@gmail.com wrote:
 Hello Jay,

 My input is just a csv file (created it myself), so I am sure it is
 not compressed in any way. Also, the same input works when I use the
 standalone example (using the hadoop executable in the bin folder).
 When I try to integrate it in a larger java program it fails  :(

 Regards,

 Bas

 On Sun, Apr 15, 2012 at 2:30 PM, JAX jayunit...@gmail.com wrote:
 That is odd why would it crash when your m/r job did not rely on snappy?

 One possibility : Maybe because your input is snappy compressed, Hadoop is 
 detecting that compression, and trying to use the snappy codec to 
 decompress.?

 Jay Vyas
 MMSB
 UCHC

 On Apr 15, 2012, at 5:08 AM, Bas Hickendorff hickendorff...@gmail.com 
 wrote:

 Hello John,

 I did restart them (in fact, I did a full reboot of the machine). The
 error is still there.

 I guess my question is: is it expected that Hadoop needs to do
 something with the Snappycodec when mapred.compress.map.output is set
 to false?

 Regards,

 Bas

 On Sun, Apr 15, 2012 at 12:04 PM, john smith js1987.sm...@gmail.com wrote:
 Can you restart tasktrackers once and run the job again? It refreshes the
 class path.

 On Sun, Apr 15, 2012 at 11:58 AM, Bas Hickendorff
 hickendorff...@gmail.comwrote:

 Thanks.

 The native snappy libraries I have installed. However, I use the
 normal jars that you get when downloading Hadoop, I am not compiling
 Hadoop myself.

 I do not want to use the snappy codec (I don't care about compression
 at the moment), but it seems it is needed anyway? I added this to the
 mapred-site.xml:

 property
        namemapred.compress.map.output/name
        valuefalse/value
 /property

 But it still fails with the error of my previous email (SnappyCodec not
 found).

 Regards,

 Bas


 On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:

 Hadoop has integrated snappy via installed native libraries instead of
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
  - You need to have the snappy system libraries (snappy and
 snappy-devel) installed before you compile hadoop. (RPMs are available on
 the web, http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
  - When you build hadoop, you will need to compile the native
 libraries(by passing -Dcompile.native=true to ant) to avail snappy 
 support.
  - You also need to make sure that snappy system library is available on
 the library path for all mapreduce tasks at runtime. Usually if you 
 install
 them on /usr/lib or /usr/local/lib, it should work.

 HTH,
 +Vinod

 On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:

 Hello,

 When I start a map-reduce job, it starts, and after a short while,
 fails with the error below (SnappyCodec not found).

 I am currently starting the job from other Java code (so the Hadoop
 executable in the bin directory is not used anymore), but in principle
 this seems to work (in the admin of the Jobtracker the job shows up
 when it starts). However after a short while the map task fails with:


 java.lang.IllegalArgumentException: Compression codec
 org.apache.hadoop.io.compress.SnappyCodec not found.
       at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
       at
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
       at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
       at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
       at java.security.AccessController.doPrivileged(Native Method)
       at javax.security.auth.Subject.doAs(Subject.java:416)
       at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
       at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.compress.SnappyCodec
       at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
       at java.security.AccessController.doPrivileged(Native Method)
       at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
       at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
       at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
       at 

Basic setup questions on Ubuntu

2012-04-15 Thread shan s
I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup.


I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo
cluster where I can  run my pig jobs under mapreduce mode.
How do I achieve that?

1. I couldd not find the  core-site.xml. hdfs-site.xml and mapred-site.xml
files with all default parameters set? Where are these located.
 (I see the files under example-conf. dir, but I guess they are example
files)
2. I see several config files under /usr/lib/hadoop/conf. But all of them
are empty files, with the comments that these can be used to override the
configuration, but these are read-only files. What is the intention of
these files being read-only.


Many Thanks,
Prashant


upload hang at DFSClient$DFSOutputStream.close(3488)

2012-04-15 Thread Mingxi Wu
Hi, 

I use hadoop cloudera 0.20.2-cdh3u0.

I have a program which uploads local files to HDFS every hour.

Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And 
write to HDFS file. After less than two days, it will hang. It hangs at 
FSDataOutputStream.close(86).
Here is the stack:

State: WAITING Running 16660 ms (user 13770 ms) blocked 11276 times for  ms 
waiting 11209 times for  ms
LockName: java.util.LinkedList@f1ca0de LockOwnerId: -1
java.lang.Object.wait(-2)
java.lang.Object.wait(485)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(3468)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(3457)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(3549)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(3488)
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(61)
org.apache.hadoop.fs.FSDataOutputStream.close(86)
org.apache.hadoop.io.IOUtils.copyBytes(59)
org.apache.hadoop.io.IOUtils.copyBytes(74)

Any suggestion to avoid this issue? It seems this is a bug in hadoop. I found 
this issue is less severe when my upload server do one upload at a time, 
instead of using multiple concurrent uploads.

Thanks,

Mingxi


Re: Basic setup questions on Ubuntu

2012-04-15 Thread Manish Bhoge
Prashant,
Post your questions to cdh-u...@cloudera.org.

Follow CDH3 installation guide. After installing package and individual 
components you need to configure all configuration files like core-site.xml, 
hdfs-site.xml etc. 

Thanks
Manish
Sent from my BlackBerry, pls excuse typo

-Original Message-
From: shan s mysub...@gmail.com
Date: Mon, 16 Apr 2012 02:49:51 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: Basic setup questions on Ubuntu

I am a newbie to Unix/Hadoop and have basic questions about CDH3 setup.


I installed CDH3 on Ubuntu 11.0 Unix box. I want to setup a sudo
cluster where I can  run my pig jobs under mapreduce mode.
How do I achieve that?

1. I couldd not find the  core-site.xml. hdfs-site.xml and mapred-site.xml
files with all default parameters set? Where are these located.
 (I see the files under example-conf. dir, but I guess they are example
files)
2. I see several config files under /usr/lib/hadoop/conf. But all of them
are empty files, with the comments that these can be used to override the
configuration, but these are read-only files. What is the intention of
these files being read-only.


Many Thanks,
Prashant



RE: upload hang at DFSClient$DFSOutputStream.close(3488)

2012-04-15 Thread Uma Maheswara Rao G
Hi Mingxi,

In your thread dump, did you check DataStreamer thread? is it running?

If DataStreamer thread is not running, then this issue would be mostly same as 
HDFS-2850.

Did you find any OOME in your clients?

Regards,
Uma

From: Mingxi Wu [mingxi...@turn.com]
Sent: Monday, April 16, 2012 7:25 AM
To: common-user@hadoop.apache.org
Subject: upload hang at DFSClient$DFSOutputStream.close(3488)

Hi,

I use hadoop cloudera 0.20.2-cdh3u0.

I have a program which uploads local files to HDFS every hour.

Basically, I open a gzip input stream by in= new GZIPInputStream(fin); And 
write to HDFS file. After less than two days, it will hang. It hangs at 
FSDataOutputStream.close(86).
Here is the stack:

State: WAITING Running 16660 ms (user 13770 ms) blocked 11276 times for  ms 
waiting 11209 times for  ms
LockName: java.util.LinkedList@f1ca0de LockOwnerId: -1
java.lang.Object.wait(-2)
java.lang.Object.wait(485)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.waitForAckedSeqno(3468)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(3457)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(3549)
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(3488)
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(61)
org.apache.hadoop.fs.FSDataOutputStream.close(86)
org.apache.hadoop.io.IOUtils.copyBytes(59)
org.apache.hadoop.io.IOUtils.copyBytes(74)

Any suggestion to avoid this issue? It seems this is a bug in hadoop. I found 
this issue is less severe when my upload server do one upload at a time, 
instead of using multiple concurrent uploads.

Thanks,

Mingxi