Re: fail and kill all tasks without killing job.

2012-07-20 Thread JAX
I believe that kill-task simple kills the task, but then the same process (i.e. 
task) starts, with a new id.

Jay Vyas 
MMSB
UCHC

On Jul 20, 2012, at 6:23 PM, Bejoy KS bejoy.had...@gmail.com wrote:

 Hi Jay
 
 Did you try
 hadoop job -kill-task task-id ? And is that not working as desired? 
 
 Regards
 Bejoy KS
 
 Sent from handheld, please excuse typos.
 
 -Original Message-
 From: jay vyas jayunit...@gmail.com
 Date: Fri, 20 Jul 2012 17:17:58 
 To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org
 Reply-To: common-user@hadoop.apache.org
 Subject: fail and kill all tasks without killing job.
 
 Hi guys : I want my tasks to end/fail, but I don't want to kill my 
 entire hadoop job.
 
 I have a hadoop job that runs 5 hadoop jobs in a row.
 Im on the last of those sub-jobs, and want to fail all tasks so that the 
 task tracker stops delegating them,
 and the hadoop main job can naturally come to a close.
 
 However, when I run hadoop job kill-attempt / fail-attempt , the 
 jobtracker seems to simply relaunch
 the same tasks with new ids.
 
 How can I tell the jobtracker to give up on redelegating?
 


Re: remote job submission

2012-04-21 Thread JAX
Thanks j harsh: 
I have another question , though ---

You mentioned that :

The client needs access to 
 the
DataNodes (for actually writing the previous files to DFS for the
JobTracker to pick up) 

What do you mean by previous files? It seems like, if designing Hadoop from 
scratch , I wouldn't want to force the client to communicate with data nodes at 
all, since those can be added and removed during a job.

Jay Vyas 
MMSB
UCHC

On Apr 21, 2012, at 1:14 AM, Harsh J ha...@cloudera.com wrote:

 the
 DataNodes (for actually writing the previous files to DFS for the
 JobTracker to pick up)


Re: Accessing global Counters

2012-04-20 Thread JAX
No reducers can't access mapper counters.
--- maybe theres a way to intermediately put counters in the distributed 
cache???

Jay Vyas 
MMSB
UCHC

On Apr 20, 2012, at 1:24 PM, Robert Evans ev...@yahoo-inc.com wrote:

 There was a discussion about this several months ago
 
 http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201112.mbox/%3CCADYHM8xiw8_bF=zqe-bagdfz6r3tob0aof9viozgtzeqgkp...@mail.gmail.com%3E
 
 The conclusion is that if you want to read them from the reducer you are 
 going to have to do something special until someone finds time to implement 
 it as part of.
 
 https://issues.apache.org/jira/browse/MAPREDUCE-3520
 
 --Bobby Evans
 
 
 On 4/20/12 11:36 AM, Amith D K amit...@huawei.com wrote:
 
 Yes U can use user defined counter as Jagat suggeted.
 
 Counter can be enum as Jagat described or any string which are called dynamic 
 counters.
 
 It is easier to use Enum counter than dynamic counters, finally it depends on 
 your use case :)
 
 Amith
 
 From: Jagat [jagatsi...@gmail.com]
 Sent: Saturday, April 21, 2012 12:25 AM
 To: common-user@hadoop.apache.org
 Subject: Re: Accessing global Counters
 
 Hi
 
 You can create your own counters like
 
 enum CountFruits {
 Apple,
 Mango,
 Banana
 }
 
 
 And in your mapper class when you see condition to increment , you can use
 Reporter incrCounter method to do the same.
 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/Reporter.html#incrCounter(java.lang.Enum,%20long)
 
 e.g
 // I saw Apple increment it by one
 reporter.incrCounter(CountFruits.Apple,1);
 
 Now you can access them using job.getCounters
 
 http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/Job.html#getCounters()
 
 Hope this helps
 
 Regards,
 
 Jagat Singh
 
 
 On Fri, Apr 20, 2012 at 9:43 PM, Gayatri Rao rgayat...@gmail.com wrote:
 
 Hi All,
 
 Is there a way for me to set global counters in Mapper and access them from
 reducer?
 Could you suggest how I can acheve this?
 
 Thanks
 Gayatri
 
 


Re: remote job submission

2012-04-20 Thread JAX
RE anirunds question on how to submit a job remotely.  

Here are my follow up questions - hope this helps to guide the discussion: 

1) Normally - what is the job client? Do you guys typically use the namenode 
as the client? 

2) In the case where the client != name node  how does the client know how 
to start up the task trackers ?

UCHC

On Apr 20, 2012, at 11:19 AM, Amith D K amit...@huawei.com wrote:

 I dont know your use case if its for test and
 ssh across the machine are disabled then u write a script that can do ssh run 
 the jobs using cli for running your jobs. U can check ssh usage.
 
 Or else use Ooze
 
 From: Robert Evans [ev...@yahoo-inc.com]
 Sent: Friday, April 20, 2012 11:17 PM
 To: common-user@hadoop.apache.org
 Subject: Re: remote job submission
 
 You can use Oozie to do it.
 
 
 On 4/20/12 8:45 AM, Arindam Choudhury arindamchoudhu...@gmail.com wrote:
 
 Sorry. But I can you give me a example.
 
 On Fri, Apr 20, 2012 at 3:08 PM, Harsh J ha...@cloudera.com wrote:
 
 Arindam,
 
 If your machine can access the clusters' NN/JT/DN ports, then you can
 simply run your job from the machine itself.
 
 On Fri, Apr 20, 2012 at 6:31 PM, Arindam Choudhury
 arindamchoudhu...@gmail.com wrote:
 If you are allowed a remote connection to the cluster's service ports,
 then you can directly submit your jobs from your local CLI. Just make
 sure your local configuration points to the right locations.
 
 Can you elaborate in details please?
 
 On Fri, Apr 20, 2012 at 2:20 PM, Harsh J ha...@cloudera.com wrote:
 
 If you are allowed a remote connection to the cluster's service ports,
 then you can directly submit your jobs from your local CLI. Just make
 sure your local configuration points to the right locations.
 
 Otherwise, perhaps you can choose to use Apache Oozie (Incubating)
 (http://incubator.apache.org/oozie/) It does provide a REST interface
 that launches jobs up for you over the supplied clusters, but its more
 oriented towards workflow management or perhaps HUE:
 https://github.com/cloudera/hue
 
 On Fri, Apr 20, 2012 at 5:37 PM, Arindam Choudhury
 arindamchoudhu...@gmail.com wrote:
 Hi,
 
 Do hadoop have any web service or other interface so I can submit jobs
 from
 remote machine?
 
 Thanks,
 Arindam
 
 
 
 --
 Harsh J
 
 
 
 
 --
 Harsh J
 
 


Reporter vs context

2012-04-20 Thread JAX
Hi guys : I notice that there's been some chatter about the Reporter in 
context of counters Forgive my ignorance here as I've never seen Reporters 
used in real code.

What is the difference between the use of our Context, and Reporter objects- 
and how are they related? Is there any overlap in their functionality.?


Jay Vyas 
MMSB
UCHC

Re: Issue with loading the Snappy Codec

2012-04-15 Thread JAX
That is odd why would it crash when your m/r job did not rely on snappy? 

One possibility : Maybe because your input is snappy compressed, Hadoop is 
detecting that compression, and trying to use the snappy codec to decompress.?

Jay Vyas 
MMSB
UCHC

On Apr 15, 2012, at 5:08 AM, Bas Hickendorff hickendorff...@gmail.com wrote:

 Hello John,
 
 I did restart them (in fact, I did a full reboot of the machine). The
 error is still there.
 
 I guess my question is: is it expected that Hadoop needs to do
 something with the Snappycodec when mapred.compress.map.output is set
 to false?
 
 Regards,
 
 Bas
 
 On Sun, Apr 15, 2012 at 12:04 PM, john smith js1987.sm...@gmail.com wrote:
 Can you restart tasktrackers once and run the job again? It refreshes the
 class path.
 
 On Sun, Apr 15, 2012 at 11:58 AM, Bas Hickendorff
 hickendorff...@gmail.comwrote:
 
 Thanks.
 
 The native snappy libraries I have installed. However, I use the
 normal jars that you get when downloading Hadoop, I am not compiling
 Hadoop myself.
 
 I do not want to use the snappy codec (I don't care about compression
 at the moment), but it seems it is needed anyway? I added this to the
 mapred-site.xml:
 
 property
namemapred.compress.map.output/name
valuefalse/value
 /property
 
 But it still fails with the error of my previous email (SnappyCodec not
 found).
 
 Regards,
 
 Bas
 
 
 On Sat, Apr 14, 2012 at 6:30 PM, Vinod Kumar Vavilapalli
 vino...@hortonworks.com wrote:
 
 Hadoop has integrated snappy via installed native libraries instead of
 snappy-java.jar (ref https://issues.apache.org/jira/browse/HADOOP-7206)
  - You need to have the snappy system libraries (snappy and
 snappy-devel) installed before you compile hadoop. (RPMs are available on
 the web, http://pkgs.org/centos-5-rhel-5/epel-i386/21/ for example)
  - When you build hadoop, you will need to compile the native
 libraries(by passing -Dcompile.native=true to ant) to avail snappy support.
  - You also need to make sure that snappy system library is available on
 the library path for all mapreduce tasks at runtime. Usually if you install
 them on /usr/lib or /usr/local/lib, it should work.
 
 HTH,
 +Vinod
 
 On Apr 14, 2012, at 4:36 AM, Bas Hickendorff wrote:
 
 Hello,
 
 When I start a map-reduce job, it starts, and after a short while,
 fails with the error below (SnappyCodec not found).
 
 I am currently starting the job from other Java code (so the Hadoop
 executable in the bin directory is not used anymore), but in principle
 this seems to work (in the admin of the Jobtracker the job shows up
 when it starts). However after a short while the map task fails with:
 
 
 java.lang.IllegalArgumentException: Compression codec
 org.apache.hadoop.io.compress.SnappyCodec not found.
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:96)
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.init(CompressionCodecFactory.java:134)
   at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:62)
   at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:522)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:416)
   at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
   at org.apache.hadoop.mapred.Child.main(Child.java:249)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.io.compress.SnappyCodec
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
   at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:334)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:264)
   at
 org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
   at
 org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:89)
   ... 10 more
 
 
 I confirmed that the SnappyCodec class is present in the
 hadoop-core-1.0.2.jar, and the snappy-java-1.0.4.1.jar is present as
 well. The directory of those jars is on the HADOOP_CLASSPATH, but it
 seems it still cannot find it. I also checked that the config files of
 Hadoop are read. I run all nodes on localhost.
 
 Any suggestions on what could be the cause of the issue?
 
 Regards,
 
 Bas
 
 


Re: Professional Hiring: Architect and Developer in Hadoop Area ( Beijing, China )

2012-04-09 Thread JAX
Im sure i speak quite accurately for the moderators that ***This is not a job  
board***

Jay Vyas 
MMSB
UCHC

On Apr 9, 2012, at 10:03 AM, Vishal Kumar Gupta groups...@gmail.com wrote:

 hi Sarah,
 
 Please find my updated resume attached with this mail.
 
 Regards,
 vishal
 
 2012/4/9 Bing Li sarah.lib...@gmail.com
 国际著名大型IT企业(排名前3位)开发中心招聘Hadoop技术专家(北京)-非猎头
 
 职位描述:
 Hadoop系统和平台开发(架构师,资深开发人员)
 
 
 职位要求:
 
 1.有设计开发大型分布式系统的经验(工作年限3年以上,架构师5年以上),hadoop大型实际应用经验优先
 
 2.良好的编程和调试经验(java or c++/c),扎实的计算机理论基础,快速的学习能力
 3. 沟通和合作能力强,熟练使用英语(包括口语)
 
 *我们将提供有竞争力的待遇,欢迎加入我们*
 
 有意请发简历到邮箱: sarah.lib...@gmail.com
 


Job, JobConf, and Configuration.

2012-04-08 Thread JAX
Hi guys.  Just a theoretical question here : I notice in chapter 1 of the 
Hadoop orielly book that the new API example has *no* Configuration object.

Why is that? 

I thought the new API still uses / needs a Configuration class when running 
jobs.



Jay Vyas 
MMSB
UCHC

On Apr 7, 2012, at 4:29 PM, Harsh J ha...@cloudera.com wrote:

 MapReduce sets mapred.child.tmp for all tasks to be the Task
 Attempt's WorkingDir/tmp automatically. This also sets the
 -Djava.io.tmpdir prop for each task at JVM boot.
 
 Hence you may use the regular Java API to create a temporary file:
 http://docs.oracle.com/javase/6/docs/api/java/io/File.html#createTempFile(java.lang.String,%20java.lang.String)
 
 These files would also be automatically deleted away after the task
 attempt is done.
 
 On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
 Hello,
 
 I would like to ask you if it is possible to create and work with a
 temporary file while in a map function.
 
 I suppose that map function is running on a single node in Hadoop cluster.
 So what is a safe way to create a temporary file and read from it in one
 map() run. If it is possible is there a size limit for the file.
 
 The file can not be created before hadoop job is created. I need to create
 and process the file inside map().
 
 Thanks for your answer.
 
 Ondrej Klimpera.
 
 
 
 -- 
 Harsh J


Re: Get Current Block or Split ID, and using it, the Block Path

2012-04-08 Thread JAX
I have a related question about blocks related to thisNormally, a reduce 
job outputs several files, all in the same directory.

But why? Since we know that Hadoop is abstracting our file for us, shouldn't 
the part-r- outputs ultimately be  thought of as a single file? 

What is the correspondence between the

Part-r-
Part-r-0001
.

Outputs from a reducer, and the native blocks stored by Hfds (if any).

Jay Vyas 
MMSB
UCHC

On Apr 8, 2012, at 2:00 PM, Harsh J ha...@cloudera.com wrote:

 Deepak
 
 On Sun, Apr 8, 2012 at 9:46 PM, Deepak Nettem deepaknet...@gmail.com wrote:
 Hi,
 
 Is it possible to get the 'id' of the currently executing split or block
 from within the mapper? Using this block Id / split id, I want to be able
 to query the namenode to get the names of hosts having that block / spllit,
 and the actual path to the data.
 
 You can get the list of host locations for the current Mapper's split
 item via: https://gist.github.com/2339170 (or generally from a
 FileSystem object via https://gist.github.com/2339181)
 
 You can't get block IDs via any available publicly supported APIs.
 Therefore, you may consider getting the local block file path as an
 unavailable option too.
 
 I need this for some analytics that I'm doing. Is there a client API that
 allows doing this?  If not, what's the best way to do this?
 
 There are some ways to go about it (I wouldn't consider this
 impossible to do for sure), but I'm curious what your 'analytics' is
 and how it correlates with needing block IDs and actual block file
 paths - Cause your problem may also be solvable by other,
 pre-available means.
 
 -- 
 Harsh J


Namespace logs : a common issue?

2012-04-06 Thread JAX
Hi guys : I'm noticing that namespace conflicts or differences are a common 
theme in hadoop both in my experience and now on this list Serv.

Does anyone have any thoughts on why this is such a common issue and how it 
will be dealt with in new releases?

Jay Vyas 
MMSB
UCHC

Hadoop fs custom commands

2012-04-01 Thread JAX
Hi guys : I wanted to make se custom Hadoop fs - commands.  Is this 
feasible/practical?  In particular. , I wanted to summarize file sizes and 
print some usefull estimated of things on the fly from My cluster.

I'm not sure how
The hadoop
Shell commands are implemented... But I thought maybe there is a higher level
Shell language or API which they might use that I can play with.?

Re: namespace error after formatting namenode (psuedo distr mode).

2012-03-30 Thread JAX
Thanks alot arpit : I will try this first thing in the morning.

For now --- I need a glass of wine.

Jay Vyas 
MMSB
UCHC

On Mar 30, 2012, at 10:38 PM, Arpit Gupta ar...@hortonworks.com wrote:

 the namespace id is persisted on the datanode data directories. As you 
 formatted the namenode these id's no longer match.
 
 So stop the datanode clean up your dfs.data.dir on your system which from the 
 logs seems to be /private/tmp/hadoop-Jpeerindex/dfs/data and then start the 
 datanode.
 
 --
 Arpit Gupta
 Hortonworks Inc.
 http://hortonworks.com/
 
 On Mar 30, 2012, at 2:33 PM, Jay Vyas wrote:
 
 Hi guys !
 
 This is very strange - I have formatted my namenode (psuedo distributed
 mode) and now Im getting some kind of namespace error.
 
 Without further ado : here is the interesting output of my logs .
 
 
 Last login: Fri Mar 30 19:29:12 on ttys009
 doolittle-5:~ Jpeerindex$
 doolittle-5:~ Jpeerindex$
 doolittle-5:~ Jpeerindex$ cat Development/hadoop-0.20.203.0/logs/*
 2012-03-30 22:28:31,640 INFO
 org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting DataNode
 STARTUP_MSG:   host = doolittle-5.local/192.168.3.78
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 0.20.203.0
 STARTUP_MSG:   build =
 http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-security-203-r
 1099333; compiled by 'oom' on Wed May  4 07:57:50 PDT 2011
 /
 2012-03-30 22:28:32,138 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
 loaded properties from hadoop-metrics2.properties
 2012-03-30 22:28:32,190 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 2012-03-30 22:28:32,191 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
 period at 10 second(s).
 2012-03-30 22:28:32,191 INFO
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
 started
 2012-03-30 22:28:32,923 INFO
 org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 2012-03-30 22:28:32,959 WARN
 org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
 exists!
 2012-03-30 22:28:34,478 INFO org.apache.hadoop.ipc.Client: Retrying connect
 to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
 2012-03-30 22:28:36,317 ERROR
 org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException:
 Incompatible namespaceIDs in /private/tmp/hadoop-Jpeerindex/dfs/data:
 namenode namespaceID = 1829914379; datanode namespaceID = 1725952472
   at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:232)
   at
 org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:147)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:354)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.init(DataNode.java:268)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1480)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1419)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1437)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1563)
   at
 org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1573)
 


Re: Question about accessing another HDFS

2011-12-08 Thread JAX
I was confused about this for a while also I dont have all the details but 
I think my question on s.o. might help you.

I was playing with different protocols...
Trying to find a way to programatically access all data in Hfds.

http://stackoverflow.com/questions/7844458/how-can-i-access-hadoop-via-the-hdfs-protocol-from-java

Jay Vyas 
MMSB
UCHC

On Dec 8, 2011, at 7:29 PM, Frank Astier fast...@yahoo-inc.com wrote:

 Can you show your code here ?  What URL protocol are you using ?
 
 I’m guess I’m being very naïve (and relatively new to HDFS). I can’t show too 
 much code, but basically, I’d like to do:
 
 Path myPath = new Path(“hdfs://A.mycompany.com//some-dir”);
 
 Where Path is a hadoop fs path. I think I can take it from there, if that 
 worked... Did you mean that I need to address the namenode with an http:// 
 address?
 
 Thanks!
 
 Frank
 
 On Thu, Dec 8, 2011 at 5:47 PM, Tom Melendez t...@supertom.com wrote:
 
 I'm hoping there is a better answer, but I'm thinking you could load
 another configuration file (with B.company in it) using Configuration,
 grab a FileSystem obj with that and then go forward.  Seems like some
 unnecessary overhead though.
 
 Thanks,
 
 Tom
 
 On Thu, Dec 8, 2011 at 2:42 PM, Frank Astier fast...@yahoo-inc.com
 wrote:
 Hi -
 
 We have two namenodes set up at our company, say:
 
 hdfs://A.mycompany.com
 hdfs://B.mycompany.com
 
 From the command line, I can do:
 
 Hadoop fs –ls hdfs://A.mycompany.com//some-dir
 
 And
 
 Hadoop fs –ls hdfs://B.mycompany.com//some-other-dir
 
 I’m now trying to do the same from a Java program that uses the HDFS
 API. No luck there. I get an exception: “Wrong FS”.
 
 Any idea what I’m missing in my Java program??
 
 Thanks,
 
 Frank
 
 
 
 
 --
 Jay Vyas
 MMSB/UCHC


Re: Hadoop MapReduce Poster

2011-11-01 Thread JAX
That's a great tutorial. I like the conciseness of it.

Jay Vyas 
MMSB
UCHC

On Nov 1, 2011, at 1:39 AM, Prashant Sharma prashant.ii...@gmail.com wrote:

 Hi Mathias,
 
   I wrote a small introduction or a quick ramp up for starting out with
 hadoop while learning it at my institute.
 http://functionalprograming.files.wordpress.com/2011/07/hadoop-2.pdf
 thanks
 -P
 
 On Mon, Oct 31, 2011 at 6:44 PM, Mathias Herberts 
 mathias.herbe...@gmail.com wrote:
 
 Hi,
 
 I'm in the process of putting together a 'Hadoop MapReduce Poster' so
 my students can better understand the various steps of a MapReduce job
 as ran by Hadoop.
 
 I intend to release the Poster under a CC-BY-NC-ND license.
 
 I'd be grateful if people could review the current draf (3) of the poster.
 
 It is available as a 200 dpi PNG here:
 
 http://www.flickr.com/photos/herberts/6298203371
 
 Any comment welcome.
 
 Mathias.
 


Re: getting there (EOF exception).

2011-10-30 Thread JAX
Thanks! Yes i agree ... But Are you sure 8020? 8020 serves on 127.0.0.1 (rather 
than 0.0.0.0) ... Thus it is inaccessible to outside clients...That is very 
odd Why would that be the case ? Any insights ( using cloud eras hadoop vm).

Sent from my iPad

On Oct 30, 2011, at 11:48 PM, Harsh J ha...@cloudera.com wrote:

 Hey Jay,
 
 I believe this may be related to your other issues as well, but 50070 is NOT 
 the port you want to connect to. 50070 serves over HTTP, while default port 
 (fs.default.name), for IPC connections is 8020, or whatever you have 
 configured.
 
 On 31-Oct-2011, at 5:17 AM, Jay Vyas wrote:
 
 Hi  guys : What is the meaning of an EOF exception when trying to connect
 to Hadoop by creating a new FileSystem object ?  Does this simply mean
 the system cant be read ?
 
 java.io.IOException: Call to /172.16.112.131:50070 failed on local
 exception: java.io.EOFException
   at org.apache.hadoop.ipc.Client.wrapException(Client.java:1139)
   at org.apache.hadoop.ipc.Client.call(Client.java:1107)
   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
   at $Proxy0.getProtocolVersion(Unknown Source)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:398)
   at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:384)
   at
 org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:111)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:213)
   at org.apache.hadoop.hdfs.DFSClient.init(DFSClient.java:180)
   at
 org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
   at
 org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1514)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
   at
 org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:1548)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1530)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:228)
   at sb.HadoopRemote.main(HadoopRemote.java:35)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readInt(DataInputStream.java:375)
   at
 org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:812)
   at org.apache.hadoop.ipc.Client$Connection.run(Client.java:720)
 
 -- 
 Jay Vyas
 MMSB/UCHC
 


Connecting to vm through java

2011-10-20 Thread JAX
 Hi guys : im getting the dreaded 

org.apache.hadoop.ipc.Client$Connection handleConnectionFailure 

When connecting to clouderas hadoop (running in a vm) to request running a 
simple m/r job (from a machine outside the hadoop vm)..

I've seen a lot of posts about this online, and it's also on stack overflow 
here : 
http://stackoverflow.com/questions/6997327/connecting-to-cloudera-vm-from-my-desktop

Any tips on debugging Javas connection to hdfs over the network?

It's not entirely clear to me  how the connection is made/authenticated between 
the  client and hadoop, for example, is a passwordless ssh file required..? I 
believe this error is related to authentication but am not sure the best way to 
test it... I have confirmed that the ip is valid And it appears that hdfs is 
being run and served over the right default port in the vm.




Sent from my iPad