Thanks for your answer. I've found the problem that I fogot to implement the
writable interface in my input values' class.
Best wishes
Song Liu from Suzhou University
在2009-03-13?17:19:35,"Sharad?Agarwal"??写道:
>???comments?are?inline:
>
>柳松?wrote:
>
>
>
>Dear?all:
>I?have?set?the?value?"Sk
wget http://namenode:port/*data/*filename
will return the filename.
The namenode will redirect the http request to a datanode that has at least
some of the blocks in local storage to serve the actual request.
The key piece of course is the /data prefix on the file name.
port is the port that the w
Hello
I realize that using HTTP, you can have a file in HDFS streamed - that
is, the servlet responds to the following request with Content-
Disposition: attachment, and a download is forced (at least from a
browsers perspective) like so:
http://localhost:50075/streamFile?filename=/somewhe
I've used wget with Hadoop Streaming without any problems. Based on the
error code you're getting, I suggest you make sure that you have the proper
write permissions for the directory in which Hadoop will process (e.g.,
download, convert, ...) on each of the task tracker machines. The location
wher
I ran into this problem as well and several people on this list provided a
helpful response: once the tasktracker starts, the maximum number of tasks
per node can not be changed. In my case, I've solved this challenge by
stopping and starting mapred (stop-mapred.sh, start-mapred.sh) between jobs.
T
My cluster nodes have 2 dual-core processors, so, in general, I want
to configure my nodes with a maximum of 3 task processes executed per
node at a time.
But, for some jobs, my tasks load large amounts of memory, and I
cannot fit 3 such tasks on a single node. For these jobs, I'd like to
enforce
Thanks.
So, the logging that I wanted to tweak was at the client end where I am
using the DistributedFileSystem class instead of using the shell to read
data. Changing the logging level there cant be done through these methods.
I got it to work by rebuilding the jars after tweaking the default lo
Two ways:
In hadoop-site.xml- add:
mapred.task.profile
true
Set profiling option to true.
mapred.task.profile.maps
1
Profiling level of maps.
mapred.task.profile.reduces
1
Profiling level of reducers.
Or in your code add
JobConf.setProfieEnabled(true);
JobConf
Hey Lukas, we love hearing about what you'd like to see in training.
If you make a note on get satisfaction, we'll track it and keep you
appraised of updates:
http://getsatisfaction.com/cloudera/products/cloudera_hadoop_training
Christophe
On Fri, Mar 13, 2009 at 2:27 PM, Lukáš Vlček wrote:
> Hi
fwiw, we have released a workaround for this issue in Cascading 1.0.5.
http://www.cascading.org/
http://cascading.googlecode.com/files/cascading-1.0.5.tgz
In short, Hadoop 0.19.0 and .1 instantiate the users Reducer class and
subsequently calls configure() when there is no intention to use the
On Mar 13, 2009, at 3:56 PM, Richa Khandelwal wrote:
You can initialize IntWritable with an empty constructor.
IntWritable i=new IntWritable();
NullWritable is better for that application than IntWritable. It
doesn't consume any space when serialized. *smile*
-- Owen
You can initialize IntWritable with an empty constructor.
IntWritable i=new IntWritable();
On Fri, Mar 13, 2009 at 2:21 PM, Andy Sautins
wrote:
>
>
> In writing a Map/Reduce job I ran across something I found a little
> strange. I have a situation where I don't need a value output from map.
>
Hi,
This is excellent!
Does any of these presentations deal specifically with processing tree and
graph data structures? I know that some basics can be found in the fifth
MapReduce lecture here (http://www.youtube.com/watch?v=BT-piFBP4fE)
presented by Aaron Kimball or here (
http://video.google.co
In writing a Map/Reduce job I ran across something I found a little
strange. I have a situation where I don't need a value output from map.
If I set the value of the value of OutputCollector to
null I get the following exception:
java.lang.NullPointerException
at
org.apache.hadoop.ma
There may be a separate issue with windows, but the error related to:
[javac] import
org.eclipse.jdt.internal.debug.ui.launcher.JavaApplicationLaunchShortcut;
is the eclipse 3.4 issue that is addressed by the patch in
https://issues.apache.org/jira/browse/HADOOP-3744
Hey there, today we released our basic Hadoop and Hive training
online. Access is free, and we address questions through Get
Satisfaction.
Many on this list are surely pros, but when you have friends trying to
get up to speed, feel free to send this along. We provide a VM so new
users can start do
Step 8 of the upgrade process mentions copying the 'edits' and 'fsimage'
file
to a backup directory. After step 19 it says:
'In case of failure the administrator should have the checkpoint files
in order to be able to repeat the procedure from the appropriate point
or to restart the old version
On 3/13/09 11:56 AM, "Allen Wittenauer" wrote:
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote:
>>When you stripe you automatically make every disk in the system have the
>> same speed as the slowest disk. In our experiences, systems are more likely
>> to have a 'slow' disk than a dead one a
Or you can check out the index contrib. The difference of the two is that:
- In Nutch's indexing map/reduce job, indexes are built in the
reduce phase. Afterwards, they are merged into smaller number of
shards if necessary. The last time I checked, the merge process does
not use map/reduce.
- I
I would agree with Enis. MapReduce is good for batch building large
indexes, but not for search which requires realtime response.
Cheers,
Ning
On Fri, Mar 13, 2009 at 10:58 AM, Enis Soztutar wrote:
> ZhiHong Fu wrote:
>>
>> Hello,
>>
>> I'm writing a program which will finish lucene s
On 3/13/09 11:25 AM, "Vadim Zaliva" wrote:
>> When you stripe you automatically make every disk in the system have the
>> same speed as the slowest disk. In our experiences, systems are more likely
>> to have a 'slow' disk than a dead one and detecting that is really
>> really hard. I
Sorry for the late reply. You can refer to the test case
TestIndexUpdater.java as an example. It uses the index contrib to
build a Lucene index and verifies by querying on the index built.
Cheers,
Ning
On Wed, Jan 14, 2009 at 12:05 PM, John Howland wrote:
> Howdy!
>
> Is there any sort of "Hell
> When you stripe you automatically make every disk in the system have the
> same speed as the slowest disk. In our experiences, systems are more likely
> to have a 'slow' disk than a dead one and detecting that is really
> really hard. In a distributed system, that multiplier effect can h
I am using DistributedFileSystem class to read data from the HDFS (with some
source code of HDFS modified by me). When I read a file, I'm getting all
debug level log messages onto the stdout on the client that I wrote. How can
I change the level to info? I havent mentioned the debug level anywhere.
Hi
Even I was looking for solution of the same problem. I haven't tested
but I think we can use Globus Toolkit's GSI-FTP feature for this work.
In the RSL config file one can write the hdfs copy command to copy
the file to hdfs. I've used this feature to upload and process file
from Globus to Sun
comments are inline:
柳松 wrote:
Dear all:
I have set the value "SkipBadRecords.setMapperMaxSkipRecords(conf, 1)",
and also the "SkipBadRecords.setAttemptsToStartSkipping(conf, 2)".
However, after 3 failed attempts, it gave me this exception message:
java.lang.NullPointerException
ZhiHong Fu wrote:
Hello,
I'm writing a program which will finish lucene searching in
about 12 index directorys, all of them are stored in HDFS. It is done
like this:
1. We get about 12 index Directorys through lucene index
functionality, each of which about 100M size,
2. We store thes
Hi folks,
I've been debugging a severe performance problems with a Hadoop-based
application (a highly modified version of Nutch). I've recently upgraded to
Hadoop 0.19.1 from a much, much older version, and a reduce that used to
work just fine is now running orders of magnitude more slowly.
>Fr
Hi
Can anyone share his experience or solution for the following problem?
I'm having to deal with a lot of different file formats, most of them csv.
Each of them shares similar semantics, ie. fields in file A exists in
file B as well.
What I'm not sure of is the exact index of the field in the csv
associate with each line an identifier (eg line number) and afterwards
resort the data by that
Miles
2009/3/13 Roldano Cattoni :
> The task should be simple, I want to put in uppercase all the words of a
> (large) file.
>
> I tried the following:
> - streaming mode
> - the mapper is a perl scri
30 matches
Mail list logo