Re: What is the best way to use the Hadoop output data

2009-06-25 Thread Huy Phan
Anybody help me on this ? :) On Thu, Jun 25, 2009 at 5:02 PM, Huy Phan wrote: > Hi everybody, I'm working on a hadoop project that processing the log > files. In the reduce part, as usual, I store the output to HDFS, but I also > want send those output data to the message queue using HTTP Post R

Re: Pregel

2009-06-25 Thread Owen O'Malley
On Jun 25, 2009, at 9:42 PM, Mark Kerzner wrote: my guess, as good as anybody's, is that Pregel is to large graphs is what Hadoop is to large datasets. I think it is much more likely a language that allows you to easily define fixed point algorithms. I would imagine a distributed version

Re: grahical tool for hadoop mapreduce

2009-06-25 Thread Kevin Weil
Some people at sun have done some recent work on this -- see a blog post at http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_and_performance, and a subsequent post with more detail at http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_monitoring_scripts . Kevin On Thu, Jun

PIG and Hadoop

2009-06-25 Thread krishna prasanna
Hi, Here is my scenario 1. Having Cluster of 3 machines, 2. Have a Jar file with includes PIG.jar. How can i run a Jar  (instead of PIG Script file) in Hadoop mode ? for running script file in hadoop mode, "java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main script1-hadoop.pig" an

Re: hadoop lucene integration

2009-06-25 Thread Nick Cen
there is sample index code under the contrib directory, maybe you can take a see. 2009/6/26 m.harig > > hi all > >I've work experience with lucene , but am new to hadoop , i > created a index by lucene , please any1 tell me how to use hadoop for my > lucene index for distributed file

hadoop lucene integration

2009-06-25 Thread m.harig
hi all I've work experience with lucene , but am new to hadoop , i created a index by lucene , please any1 tell me how to use hadoop for my lucene index for distributed file system , if possible can any1 send me an example or the link in which i can use it for my index. Please . -- V

Pregel

2009-06-25 Thread Mark Kerzner
Hi all, my guess, as good as anybody's, is that Pregel is to large graphs is what Hadoop is to large datasets. In other words, Pregel is the next natural step for massively scalable computations after Hadoop. And, as with MapReduce, Google will talk about the technology but not give out the code im

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Yes, my HDFS paths are of the form /home/user-name/ And I have used these in DistributedCache's addCacheFiles method successfully. Thanks, Akhil Amareshwari Sriramadasu wrote: > > Is your hdfs path /home/akhil1988/Config.zip? Usually hdfs path is of the > form /user/akhil1988/Config.zip. > J

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Is your hdfs path /home/akhil1988/Config.zip? Usually hdfs path is of the form /user/akhil1988/Config.zip. Just wondering if you are giving wrong path in the uri! Thanks Amareshwari akhil1988 wrote: Thanks Amareshwari for your reply! The file Config.zip is lying in the HDFS, if it would not h

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Thanks Amareshwari for your reply! The file Config.zip is lying in the HDFS, if it would not have been then the error would be reported by the jobtracker itself while executing the statement: DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip"), conf); But I get error in the ma

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Hi Akhil, DistributedCache.addCacheArchive takes path on hdfs. From your code, it looks like you are passing local path. Also, if you want to create symlink, you should pass URI as hdfs://#, besides calling DistributedCache.createSymlink(conf); Thanks Amareshwari akhil1988 wrote: Please a

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Edward J. Yoon
I just made a wiki page -- http://wiki.apache.org/hadoop/Hambrug -- Let's discuss about the graph computing framework named Hambrug. On Fri, Jun 26, 2009 at 8:43 AM, Edward J. Yoon wrote: > To be honest, I was thought the BigTable (HBase) for the map/reduce > based graph/matrix operations. The mai

Re: 'could not lock file' error.

2009-06-25 Thread Edward J. Yoon
Please ignore. I just made new account. On Fri, Jun 26, 2009 at 11:42 AM, Edward J. Yoon wrote: > Hi, > > I always get the 'could not lock file' error when editing/creating > pages - "Page could not get locked. Missing 'current' file?" > > My ID is 'udanax'. Someone can help me? > -- > Best Regard

'could not lock file' error.

2009-06-25 Thread Edward J. Yoon
Hi, I always get the 'could not lock file' error when editing/creating pages - "Page could not get locked. Missing 'current' file?" My ID is 'udanax'. Someone can help me? -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org

grahical tool for hadoop mapreduce

2009-06-25 Thread Manhee Jo
Hi, Do you know any graphical tools to show the progress of mapreduce using the job log under logs/history/ ? The web interface (namenode:50030) gives me similar one. But what I need is more specific ones that show the number of total running map tasks and reduce tasks at some points of time, w

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Edward J. Yoon
To be honest, I was thought the BigTable (HBase) for the map/reduce based graph/matrix operations. The main problems of performance were the sequential algorithm, the cost for MR job building in iterations. and, the locality of adjacent components. As mentioned on Pregel, If some algorithm requires

Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-25 Thread Bradford Stephens
Hey all, Just writing a quick note of "thanks", we had another solid group of people show up! As always, we learned quite a lot about interesting use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud Stack'. I couldn't get it taped, but we talked about: -Scaling Lucene with Katta and

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Please ask any questions if I am not clear above about the problem I am facing. Thanks, Akhil akhil1988 wrote: > > Hi All! > > I want a directory to be present in the local working directory of the > task for which I am using the following statements: > > DistributedCache.addCacheArchive(new

map.input.file in hadoop0.20

2009-06-25 Thread Amandeep Khurana
How do I read the "map.input.file" parameter in the mapper class in hadoop 0.20. In earlier versions, this would work: public void configure(JobConf conf) { filename = conf.get("map.input.file"); } What about 0.20? Amandeep Amandeep Khurana Computer Science Graduate Student

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Amandeep Khurana
I've been working on some graph stuff using MR as well. I'd be more than interested to chip in as well.. I remember exchanging a few mails with Paolo about having an RDF store over HBase and developing graph algorithms over it. Amandeep Khurana Computer Science Graduate Student University of Cal

RE: FYI, Large-scale graph computing at Google

2009-06-25 Thread Patterson, Josh
Steve, I'm a little lost here; Is this a replacement for M/R or is it some new code that sits ontop of M/R that runs an iteration over some sort of graph's vertexes? My quick scan of Google's article didn't seem to yeild a distinction. Either way, I'd say for our data that a graph processing lib fo

Using addCacheArchive

2009-06-25 Thread akhil1988
Hi All! I want a directory to be present in the local working directory of the task for which I am using the following statements: DistributedCache.addCacheArchive(new URI("/home/akhil1988/Config.zip"), conf); DistributedCache.createSymlink(conf); >> Here Config is a directory which I have zip

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
mike anderson wrote: This would be really useful for my current projects. I'd be more than happy to help out if needed. well the first bit of code to play with then is this http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/ the standalone.xml file is the one yo

Re: HDFS Safemode and EC2 EBS?

2009-06-25 Thread Tom White
Hi Chris, You should really start all the slave nodes to be sure that you don't lose data. If you start fewer than #nodes - #replication + 1 nodes then you are virtually guaranteed to lose blocks. Starting 6 nodes out of 10 will cause the filesystem to remain in safe mode, as you've seen. BTW I'm

HDFS Safemode and EC2 EBS?

2009-06-25 Thread Chris Curtin
Hi, I am using 0.19.0 on EC2. The Hadoop execution and HDFS directories are on EBS volumes mounted to each node in my EC2 cluster. Only the install of hadoop is in the AMI. We have 10 EBS volumes and when the cluster starts it randomly picks one for each slave. We don't always start all 10 slaves

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread mike anderson
This would be really useful for my current projects. I'd be more than happy to help out if needed. On Thu, Jun 25, 2009 at 5:57 AM, Steve Loughran wrote: > Edward J. Yoon wrote: > >> What do you think about another new computation framework on HDFS? >> >> On Mon, Jun 22, 2009 at 3:50 PM, Edward

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Usman Waheed
Thanks much, Cheers, Usman You can change the value of hadoop.root.logger in conf/log4j.properties to change the log level globally. See also the section "Custom Logging levels" in the same file to set levels on a per-component basis. You can also use hadoop daemonlog to set log levels on a temp

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Tom White
You can change the value of hadoop.root.logger in conf/log4j.properties to change the log level globally. See also the section "Custom Logging levels" in the same file to set levels on a per-component basis. You can also use hadoop daemonlog to set log levels on a temporary basis (they are reset o

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Usman Waheed
Hi Tom, Thanks for the trick :). I tried by setting the replication to 3 in the hadoop-default.xml but then the namenode-logfile in /var/log/hadoop started getting full with the messages marked in bold: 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE* SafeModeInfo.leav

What is the best way to use the Hadoop output data

2009-06-25 Thread Huy Phan
Hi everybody, I'm working on a hadoop project that processing the log files. In the reduce part, as usual, I store the output to HDFS, but I also want send those output data to the message queue using HTTP Post Request. I'm wondering if there's any performance killer in this approach, I posted the

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
Edward J. Yoon wrote: What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon wrote: http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html -- It sounds like Pregel seems, a computing framework based on d

Re: Hadoop 0.20.0, xml parsing related error

2009-06-25 Thread Steve Loughran
Ram Kulbak wrote: Hi, The exception is a result of having xerces in the classpath. To resolve, make sure you are using Java 6 and set the following system property: -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl This can also be re

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Thanks all problem is resolved now. Issues is the same, jar file is in HDFSwhich logically is wrong.   Krishna.   Hi Krishna, You get this error when the jar file cannot be found. It looks like /user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact it should be a local path. Cheer

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Tom White
Hi Usman, Before the rebalancer was introduced one trick people used was to increase the replication on all the files in the system, wait for re-replication to complete, then decrease the replication to the original level. You can do this using hadoop fs -setrep. Cheers, Tom On Thu, Jun 25, 2009

Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Usman Waheed
Hi, One of our test clusters is running HADOOP 15.3 with replication level set to 2. The datanodes are not balanced at all. Datanode_1: 52% Datanode_2: 82% Datanode_3: 30% 15.3 does not have the rebalancer capability, we are planning to upgrade but not for now. If i take out Datanode_1 fro

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Tom White
Hi Krishna, You get this error when the jar file cannot be found. It looks like /user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact it should be a local path. Cheers, Tom On Thu, Jun 25, 2009 at 9:43 AM, krishna prasanna wrote: > Oh! thanks Shravan > > Krishna. > > > >

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Amareshwari Sriramadasu
Is your jar file in local file system or hdfs? The jar file should be in local fs. Thanks Amareshwari Shravan Mahankali wrote: Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] -

Re: Problem with setting up the cluster

2009-06-25 Thread Tom White
Have a look at the datanode log files on the datanode machines and see what the error is in there. Cheers, Tom On Thu, Jun 25, 2009 at 6:21 AM, .ke. sivakumar wrote: > Hi all, I'm a student and I have been tryin to set up the hadoop cluster for > a while > but have been unsuccessful till now. > >

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Edward J. Yoon
What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon wrote: > > http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html > -- It sounds like Pregel seems, a computing framework based on dynamic > programmin

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Oh! thanks Shravan Krishna. From: Shravan Mahankali To: core-user@hadoop.apache.org Sent: Thursday, 25 June, 2009 1:50:51 PM Subject: RE: Unable to run Jar file in Hadoop. Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M C

RE: Unable to run Jar file in Hadoop.

2009-06-25 Thread Shravan Mahankali
Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] - This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom t

Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Hi,   When i am trying to run a Jar in Hadoop, it is giving me the following error   had...@krishna-dev:/usr/local/hadoop$ bin/hadoop jar /user/hadoop/hadoop-0.18.0-examples.jar java.io.IOException: Error opening job jar: /user/hadoop/hadoop-0.18.0-examples.jar     at org.apache.hadoop.util.Run