date:20090313

Re:How to skip bad records in .19.1

2009-03-13 Thread 柳松

Thanks for your answer. I've found the problem that I fogot to implement the writable interface in my input values' class. Best wishes Song Liu from Suzhou University 在2009-03-13?17:19:35，"Sharad?Agarwal"??写道： >???comments?are?inline: > >柳松?wrote: > > > >Dear?all: >I?have?set?the?value?"Sk

Re: HTTP addressable files from HDFS?

2009-03-13 Thread jason hadoop

wget http://namenode:port/*data/*filename will return the filename. The namenode will redirect the http request to a datanode that has at least some of the blocks in local storage to serve the actual request. The key piece of course is the /data prefix on the file name. port is the port that the w

HTTP addressable files from HDFS?

2009-03-13 Thread David Michael

Hello I realize that using HTTP, you can have a file in HDFS streamed - that is, the servlet responds to the following request with Content- Disposition: attachment, and a download is forced (at least from a browsers perspective) like so: http://localhost:50075/streamFile?filename=/somewhe

Re: Hadoop Streaming throw an exception with wget as the mapper

2009-03-13 Thread S D

I've used wget with Hadoop Streaming without any problems. Based on the error code you're getting, I suggest you make sure that you have the proper write permissions for the directory in which Hadoop will process (e.g., download, convert, ...) on each of the task tracker machines. The location wher

Re: Controlling maximum # of tasks per node on per-job basis?

2009-03-13 Thread S D

I ran into this problem as well and several people on this list provided a helpful response: once the tasktracker starts, the maximum number of tasks per node can not be changed. In my case, I've solved this challenge by stopping and starting mapred (stop-mapred.sh, start-mapred.sh) between jobs. T

Controlling maximum # of tasks per node on per-job basis?

2009-03-13 Thread Stuart White

My cluster nodes have 2 dual-core processors, so, in general, I want to configure my nodes with a maximum of 3 task processes executed per node at a time. But, for some jobs, my tasks load large amounts of memory, and I cannot fit 3 such tasks on a single node. For these jobs, I'd like to enforce

Re: Changing logging level

2009-03-13 Thread Amandeep Khurana

Thanks. So, the logging that I wanted to tweak was at the client end where I am using the DistributedFileSystem class instead of using the shell to read data. Changing the logging level there cant be done through these methods. I got it to work by rebuilding the jars after tweaking the default lo

Re: Changing logging level

2009-03-13 Thread Richa Khandelwal

Two ways: In hadoop-site.xml- add: mapred.task.profile true Set profiling option to true. mapred.task.profile.maps 1 Profiling level of maps. mapred.task.profile.reduces 1 Profiling level of reducers. Or in your code add JobConf.setProfieEnabled(true); JobConf

Re: Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Christophe Bisciglia

Hey Lukas, we love hearing about what you'd like to see in training. If you make a note on get satisfaction, we'll track it and keep you appraised of updates: http://getsatisfaction.com/cloudera/products/cloudera_hadoop_training Christophe On Fri, Mar 13, 2009 at 2:27 PM, Lukáš Vlček wrote: > Hi

Re: Reducers spawned when mapred.reduce.tasks=0

2009-03-13 Thread Chris K Wensel

fwiw, we have released a workaround for this issue in Cascading 1.0.5. http://www.cascading.org/ http://cascading.googlecode.com/files/cascading-1.0.5.tgz In short, Hadoop 0.19.0 and .1 instantiate the users Reducer class and subsequently calls configure() when there is no intention to use the

Re: null value output from map...

2009-03-13 Thread Owen O'Malley

On Mar 13, 2009, at 3:56 PM, Richa Khandelwal wrote: You can initialize IntWritable with an empty constructor. IntWritable i=new IntWritable(); NullWritable is better for that application than IntWritable. It doesn't consume any space when serialized. *smile* -- Owen

Re: null value output from map...

2009-03-13 Thread Richa Khandelwal

You can initialize IntWritable with an empty constructor. IntWritable i=new IntWritable(); On Fri, Mar 13, 2009 at 2:21 PM, Andy Sautins wrote: > > > In writing a Map/Reduce job I ran across something I found a little > strange. I have a situation where I don't need a value output from map. >

Re: Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Lukáš Vlček

Hi, This is excellent! Does any of these presentations deal specifically with processing tree and graph data structures? I know that some basics can be found in the fifth MapReduce lecture here (http://www.youtube.com/watch?v=BT-piFBP4fE) presented by Aaron Kimball or here ( http://video.google.co

null value output from map...

2009-03-13 Thread Andy Sautins

In writing a Map/Reduce job I ran across something I found a little strange. I have a situation where I don't need a value output from map. If I set the value of the value of OutputCollector to null I get the following exception: java.lang.NullPointerException at org.apache.hadoop.ma

Re: Building Release 0.19.1

2009-03-13 Thread Kevin Peterson

There may be a separate issue with windows, but the error related to: [javac] import org.eclipse.jdt.internal.debug.ui.launcher.JavaApplicationLaunchShortcut; is the eclipse 3.4 issue that is addressed by the patch in https://issues.apache.org/jira/browse/HADOOP-3744

Cloudera Hadoop and Hive training now free online

2009-03-13 Thread Christophe Bisciglia

Hey there, today we released our basic Hadoop and Hive training online. Access is free, and we address questions through Get Satisfaction. Many on this list are surely pros, but when you have friends trying to get up to speed, feel free to send this along. We provide a VM so new users can start do

Hadoop Upgrade Wiki

2009-03-13 Thread Mayuran Yogarajah

Step 8 of the upgrade process mentions copying the 'edits' and 'fsimage' file to a backup directory. After step 19 it says: 'In case of failure the administrator should have the checkpoint files in order to be able to repeat the procedure from the appropriate point or to restart the old version

Re: tuning performance

2009-03-13 Thread Scott Carey

On 3/13/09 11:56 AM, "Allen Wittenauer" wrote: On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >>When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk. In our experiences, systems are more likely >> to have a 'slow' disk than a dead one a

Re: Creating Lucene index in Hadoop

2009-03-13 Thread Ning Li

Or you can check out the index contrib. The difference of the two is that: - In Nutch's indexing map/reduce job, indexes are built in the reduce phase. Afterwards, they are merged into smaller number of shards if necessary. The last time I checked, the merge process does not use map/reduce. - I

Re: how to optimize mapreduce procedure??

2009-03-13 Thread Ning Li

I would agree with Enis. MapReduce is good for batch building large indexes, but not for search which requires realtime response. Cheers, Ning On Fri, Mar 13, 2009 at 10:58 AM, Enis Soztutar wrote: > ZhiHong Fu wrote: >> >> Hello, >> >> I'm writing a program which will finish lucene s

Re: tuning performance

2009-03-13 Thread Allen Wittenauer

On 3/13/09 11:25 AM, "Vadim Zaliva" wrote: >> When you stripe you automatically make every disk in the system have the >> same speed as the slowest disk. In our experiences, systems are more likely >> to have a 'slow' disk than a dead one and detecting that is really >> really hard. I

Re: Hello, world for Hadoop + Lucene

2009-03-13 Thread Ning Li

Sorry for the late reply. You can refer to the test case TestIndexUpdater.java as an example. It uses the index contrib to build a Lucene index and verifies by querying on the index built. Cheers, Ning On Wed, Jan 14, 2009 at 12:05 PM, John Howland wrote: > Howdy! > > Is there any sort of "Hell

Re: tuning performance

2009-03-13 Thread Vadim Zaliva

> When you stripe you automatically make every disk in the system have the > same speed as the slowest disk. In our experiences, systems are more likely > to have a 'slow' disk than a dead one and detecting that is really > really hard. In a distributed system, that multiplier effect can h

Changing logging level

2009-03-13 Thread Amandeep Khurana

I am using DistributedFileSystem class to read data from the HDFS (with some source code of HDFS modified by me). When I read a file, I'm getting all debug level log messages onto the stdout on the client that I wrote. How can I change the level to info? I havent mentioned the debug level anywhere.

Re: how to upload files by web page

2009-03-13 Thread nitesh bhatia

Hi Even I was looking for solution of the same problem. I haven't tested but I think we can use Globus Toolkit's GSI-FTP feature for this work. In the RSL config file one can write the hdfs copy command to copy the file to hdfs. I've used this feature to upload and process file from Globus to Sun

How to skip bad records in .19.1

2009-03-13 Thread Sharad Agarwal

comments are inline: 柳松 wrote: Dear all: I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)", and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)". However, after 3 failed attempts, it gave me this exception message: java.lang.NullPointerException

Re: how to optimize mapreduce procedure??

2009-03-13 Thread Enis Soztutar

ZhiHong Fu wrote: Hello, I'm writing a program which will finish lucene searching in about 12 index directorys, all of them are stored in HDFS. It is done like this: 1. We get about 12 index Directorys through lucene index functionality, each of which about 100M size, 2. We store thes

Reduce task going away for 10 seconds at a time

2009-03-13 Thread Doug Cook

Hi folks, I've been debugging a severe performance problems with a Hadoop-based application (a highly modified version of Nutch). I've recently upgraded to Hadoop 0.19.1 from a much, much older version, and a reduce that used to work just fine is now running orders of magnitude more slowly. >Fr

csv input format handling and mapping

2009-03-13 Thread Stefan Podkowinski

Hi Can anyone share his experience or solution for the following problem? I'm having to deal with a lot of different file formats, most of them csv. Each of them shares similar semantics, ie. fields in file A exists in file B as well. What I'm not sure of is the exact index of the field in the csv

Re: how to preserve original line order?

2009-03-13 Thread Miles Osborne

associate with each line an identifier (eg line number) and afterwards resort the data by that Miles 2009/3/13 Roldano Cattoni : > The task should be simple, I want to put in uppercase all the words of a > (large) file. > > I tried the following: > - streaming mode > - the mapper is a perl scri

Re:How to skip bad records in .19.1

Re: HTTP addressable files from HDFS?

HTTP addressable files from HDFS?

Re: Hadoop Streaming throw an exception with wget as the mapper

Re: Controlling maximum # of tasks per node on per-job basis?

Controlling maximum # of tasks per node on per-job basis?

Re: Changing logging level

Re: Changing logging level

Re: Cloudera Hadoop and Hive training now free online

Re: Reducers spawned when mapred.reduce.tasks=0

Re: null value output from map...

Re: null value output from map...

Re: Cloudera Hadoop and Hive training now free online

null value output from map...

Re: Building Release 0.19.1

Cloudera Hadoop and Hive training now free online

Hadoop Upgrade Wiki

Re: tuning performance

Re: Creating Lucene index in Hadoop

Re: how to optimize mapreduce procedure??

Re: tuning performance

Re: Hello, world for Hadoop + Lucene

Re: tuning performance

Changing logging level

Re: how to upload files by web page

How to skip bad records in .19.1

Re: how to optimize mapreduce procedure??

Reduce task going away for 10 seconds at a time

csv input format handling and mapping

Re: how to preserve original line order?

30 matches

Site Navigation

Mail list logo

Footer information