Re:How to skip bad records in .19.1

2009-03-13 Thread 柳松
Thanks for your answer. I've found the problem that I fogot to implement the writable interface in my input values' class. Best wishes Song Liu from Suzhou University 在2009-03-13?17:19:35,"Sharad?Agarwal"??写道: >???comments?are?inline: > >柳松?wrote: > > > >Dear?all: >I?have?set?the?value?"Sk

Re: HTTP addressable files from HDFS?

2009-03-13 Thread jason hadoop
wget http://namenode:port/*data/*filename will return the filename. The namenode will redirect the http request to a datanode that has at least some of the blocks in local storage to serve the actual request. The key piece of course is the /data prefix on the file name. port is the port that the w

HTTP addressable files from HDFS?

2009-03-13 Thread David Michael
Hello I realize that using HTTP, you can have a file in HDFS streamed - that is, the servlet responds to the following request with Content- Disposition: attachment, and a download is forced (at least from a browsers perspective) like so: http://localhost:50075/streamFile?filename=/somewhe

Re: Hadoop Streaming throw an exception with wget as the mapper

2009-03-13 Thread S D
I've used wget with Hadoop Streaming without any problems. Based on the error code you're getting, I suggest you make sure that you have the proper write permissions for the directory in which Hadoop will process (e.g., download, convert, ...) on each of the task tracker machines. The location wher

Re: Controlling maximum # of tasks per node on per-job basis?

2009-03-13 Thread S D
I ran into this problem as well and several people on this list provided a helpful response: once the tasktracker starts, the maximum number of tasks per node can not be changed. In my case, I've solved this challenge by stopping and starting mapred (stop-mapred.sh, start-mapred.sh) between jobs. T

Controlling maximum # of tasks per node on per-job basis?

2009-03-13 Thread Stuart White
My cluster nodes have 2 dual-core processors, so, in general, I want to configure my nodes with a maximum of 3 task processes executed per node at a time. But, for some jobs, my tasks load large amounts of memory, and I cannot fit 3 such tasks on a single node. For these jobs, I'd like to enforce

Re: Changing logging level

2009-03-13 Thread Amandeep Khurana
Thanks. So, the logging that I wanted to tweak was at the client end where I am using the DistributedFileSystem class instead of using the shell to read data. Changing the logging level there cant be done through these methods. I got it to work by rebuilding the jars after tweaking the default lo

Re: Changing logging level

2009-03-13 Thread Richa Khandelwal
Two ways: In hadoop-site.xml- add: mapred.task.profile true Set profiling option to true. mapred.task.profile.maps 1 Profiling level of maps. mapred.task.profile.reduces 1 Profiling level of reducers. Or in your code add JobConf.setProfieEnabled(true); JobConf

Re: Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Christophe Bisciglia
Hey Lukas, we love hearing about what you'd like to see in training. If you make a note on get satisfaction, we'll track it and keep you appraised of updates: http://getsatisfaction.com/cloudera/products/cloudera_hadoop_training Christophe On Fri, Mar 13, 2009 at 2:27 PM, Lukáš Vlček wrote: > Hi

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-13 Thread Chris K Wensel
fwiw, we have released a workaround for this issue in Cascading 1.0.5. http://www.cascading.org/ http://cascading.googlecode.com/files/cascading-1.0.5.tgz In short, Hadoop 0.19.0 and .1 instantiate the users Reducer class and subsequently calls configure() when there is no intention to use the

Re: null value output from map...

2009-03-13 Thread Owen O'Malley
On Mar 13, 2009, at 3:56 PM, Richa Khandelwal wrote: You can initialize IntWritable with an empty constructor. IntWritable i=new IntWritable(); NullWritable is better for that application than IntWritable. It doesn't consume any space when serialized. *smile* -- Owen

Re: null value output from map...

2009-03-13 Thread Richa Khandelwal
You can initialize IntWritable with an empty constructor. IntWritable i=new IntWritable(); On Fri, Mar 13, 2009 at 2:21 PM, Andy Sautins wrote: > > > In writing a Map/Reduce job I ran across something I found a little > strange. I have a situation where I don't need a value output from map. >

Re: Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Lukáš Vlček
Hi, This is excellent! Does any of these presentations deal specifically with processing tree and graph data structures? I know that some basics can be found in the fifth MapReduce lecture here (http://www.youtube.com/watch?v=BT-piFBP4fE) presented by Aaron Kimball or here ( http://video.google.co

null value output from map...

2009-03-13 Thread Andy Sautins
In writing a Map/Reduce job I ran across something I found a little strange. I have a situation where I don't need a value output from map. If I set the value of the value of OutputCollector to null I get the following exception: java.lang.NullPointerException at org.apache.hadoop.ma

Re: Building Release 0.19.1

2009-03-13 Thread Kevin Peterson
There may be a separate issue with windows, but the error related to: [javac] import org.eclipse.jdt.internal.debug.ui.launcher.JavaApplicationLaunchShortcut; is the eclipse 3.4 issue that is addressed by the patch in https://issues.apache.org/jira/browse/HADOOP-3744

Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Christophe Bisciglia
Hey there, today we released our basic Hadoop and Hive training online. Access is free, and we address questions through Get Satisfaction. Many on this list are surely pros, but when you have friends trying to get up to speed, feel free to send this along. We provide a VM so new users can start do

Hadoop Upgrade Wiki

2009-03-13 Thread Mayuran Yogarajah
Step 8 of the upgrade process mentions copying the 'edits' and 'fsimage' file to a backup directory. After step 19 it says: 'In case of failure the administrator should have the checkpoint files in order to be able to repeat the procedure from the appropriate point or to restart the old version

Re: tuning performance

2009-03-13 Thread Scott Carey
On 3/13/09 11:56 AM, "Allen Wittenauer" wrote: On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >>When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk. In our experiences, systems are more likely >> to have a 'slow' disk than a dead one a

Re: Creating Lucene index in Hadoop

2009-03-13 Thread Ning Li
Or you can check out the index contrib. The difference of the two is that: - In Nutch's indexing map/reduce job, indexes are built in the reduce phase. Afterwards, they are merged into smaller number of shards if necessary. The last time I checked, the merge process does not use map/reduce. - I

Re: how to optimize mapreduce procedure??

2009-03-13 Thread Ning Li
I would agree with Enis. MapReduce is good for batch building large indexes, but not for search which requires realtime response. Cheers, Ning On Fri, Mar 13, 2009 at 10:58 AM, Enis Soztutar wrote: > ZhiHong Fu wrote: >> >> Hello, >> >>           I'm writing a program which will finish lucene s

Re: tuning performance

2009-03-13 Thread Allen Wittenauer
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >>    When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk.  In our experiences, systems are more likely >> to have a 'slow' disk than a dead one and detecting that is really >> really hard.  I

Re: Hello, world for Hadoop + Lucene

2009-03-13 Thread Ning Li
Sorry for the late reply. You can refer to the test case TestIndexUpdater.java as an example. It uses the index contrib to build a Lucene index and verifies by querying on the index built. Cheers, Ning On Wed, Jan 14, 2009 at 12:05 PM, John Howland wrote: > Howdy! > > Is there any sort of "Hell

Re: tuning performance

2009-03-13 Thread Vadim Zaliva
>    When you stripe you automatically make every disk in the system have the > same speed as the slowest disk.  In our experiences, systems are more likely > to have a 'slow' disk than a dead one and detecting that is really > really hard.  In a distributed system, that multiplier effect can h

Changing logging level

2009-03-13 Thread Amandeep Khurana
I am using DistributedFileSystem class to read data from the HDFS (with some source code of HDFS modified by me). When I read a file, I'm getting all debug level log messages onto the stdout on the client that I wrote. How can I change the level to info? I havent mentioned the debug level anywhere.

Re: how to upload files by web page

2009-03-13 Thread nitesh bhatia
Hi Even I was looking for solution of the same problem. I haven't tested but I think we can use Globus Toolkit's GSI-FTP feature for this work. In the RSL config file one can write the hdfs copy command to copy the file to hdfs. I've used this feature to upload and process file from Globus to Sun

How to skip bad records in .19.1

2009-03-13 Thread Sharad Agarwal
comments are inline: 柳松 wrote: Dear all: I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)", and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)". However, after 3 failed attempts, it gave me this exception message: java.lang.NullPointerException

Re: how to optimize mapreduce procedure??

2009-03-13 Thread Enis Soztutar
ZhiHong Fu wrote: Hello, I'm writing a program which will finish lucene searching in about 12 index directorys, all of them are stored in HDFS. It is done like this: 1. We get about 12 index Directorys through lucene index functionality, each of which about 100M size, 2. We store thes

Reduce task going away for 10 seconds at a time

2009-03-13 Thread Doug Cook
Hi folks, I've been debugging a severe performance problems with a Hadoop-based application (a highly modified version of Nutch). I've recently upgraded to Hadoop 0.19.1 from a much, much older version, and a reduce that used to work just fine is now running orders of magnitude more slowly. >Fr

csv input format handling and mapping

2009-03-13 Thread Stefan Podkowinski
Hi Can anyone share his experience or solution for the following problem? I'm having to deal with a lot of different file formats, most of them csv. Each of them shares similar semantics, ie. fields in file A exists in file B as well. What I'm not sure of is the exact index of the field in the csv

Re: how to preserve original line order?

2009-03-13 Thread Miles Osborne
associate with each line an identifier (eg line number) and afterwards resort the data by that Miles 2009/3/13 Roldano Cattoni : > The task should be simple, I want to put in uppercase all the words of a > (large) file. > > I tried the following: >  - streaming mode >  - the mapper is a perl scri