Hi Dmitry,
What version of hadoop are you using?
Assuming your 3G DB is a read only lookup... can you load it into
memory in the Map.configure and then use (0.19+ only...):
property
namemapred.job.reuse.jvm.num.tasks/name
value-1/value
/property
So that the Maps are reused for all time
Hi,
Try to use:
conf.setJarByClass(EchoOche.class); // conf is the JobConf instance of your
example.
Hope this helps,
Rasit
2009/1/20 Shyam Sarkar shyam.s.sar...@gmail.com
Hi,
I was trying to run Hadoop wordcount version 2 example under Cygwin. I
tried
without pattern.txt file -- It
Amit k. Saha wrote:
On Wed, Jan 21, 2009 at 5:53 PM, Matthias Scherer
matthias.sche...@1und1.de wrote:
Hi all,
we've made our first steps in evaluating hadoop. The setup of 2 VMs as a
hadoop grid was very easy and works fine.
Now our operations team wonders why hadoop has to be able to
Hi,
I have a task to process large quantities of files by converting them
into other formats. Each file is processed as a whole and converted to a
target format. Since there are 100's of GB of data I thought it suitable
for Hadoop, but the problem is, I don't think the files can be broken
apart
Hi Steve and Amit,
Thanks for your answers. I agree with you that key-based ssh is nothing to
worry about. But I'm wondering what exactly - that means wich grid
administration tasks - hadoop does via ssh?! Does it restart crashed data nodes
or tasks trackers on the slaves? Oder does it
Matthias Scherer wrote:
Hi Steve and Amit,
Thanks for your answers. I agree with you that key-based ssh is nothing to
worry about. But I'm wondering what exactly - that means wich grid
administration tasks - hadoop does via ssh?! Does it restart crashed data nodes
or tasks trackers on the
Hi Matthias,
It is not necessary to have SSH set up to run Hadoop, but it does make
things easier. SSH is used by the scripts in the bin directory which
start and stop daemons across the cluster (the slave nodes are defined
in the slaves file), see the start-all.sh script as a starting point.
You can do that. I did a Map/Reduce job for about 6 GB of PDFs to
concatenate them, and the New York times used Hadoop to process a few TB of
PDFs.
What I would do is this:
- Use the iText library, a Java library for PDF manipulation (don't know
what you would use for reading Word docs)
- Don't
Hmmm ...
From a space efficiency perspective, given HDFS (with large block size) is
expecting large files, is Hadoop optimized for processing large number of
small files ? Does each file take up at least 1 block ? or multiple files can
sit on the same block.
Rgds,
Ricky
-Original
Richard,
Thanks for the suggestion. I actually am building an EC2 architecture
to facilitate this! I tried using a database to warehouse the files, and
then NFS but the connection load is too heavy. So I thought maybe HDFS
could be used just too mitigate the data access across all the
Ricky,
Hadoop was formerly optimized for large files, usually files of size larger
than one input split. However, there is an input format called
MultiFileInputFormat which can be used to utilize Hadoop to work efficiently
on smaller files. You can also set the isSplittable method of an input
Jim, thanks for your explanation. But isn't isSplittable an option in writing
output rather than reading input ?
There are two phases.
1) Upload the data from local file to HDFS. Is there an option in the hadoop
fs copy to pack multiple small files in a single block and also not splitting
Darren- I would definitely use HDFS to get the data to all the instances.
I'm not sure about your 32 processes or SQS, but let me/us know what you
find.
Richard J. Zak
-Original Message-
From: Ricky Ho [mailto:r...@adobe.com]
Sent: Wednesday, January 21, 2009 15:00
To:
HBase 0.19.0 is now available for download
http://hadoop.apache.org/hbase/releases.html
Thanks to all who contributed to this release. 185 issues have been
fixed since hbase 0.18.0. Release notes are available here:
http://tinyurl.com/8xmyx9
At your service,
The HBase Team
:7274/logs/log.20090121 /user/dyoung/mylogs
This fails:
With failures, global counters are inaccurate; consider running with -i
Copy failed: java.io.IOException: No FileSystem for scheme: http
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1364
Hi Derek,
The http in http://core:7274/logs/log.20090121; should be hftp. hftp is
the scheme name of HftpFileSystem which uses http for accessing hdfs.
Hope this helps.
Nicholas Sze
- Original Message
From: Derek Young dyo...@kayak.com
To: core-user@hadoop.apache.org
Sent
Tsz Wo (Nicholas), Sze s29752-hadoopu...@... writes:
Hi Derek,
The http in http://core:7274/logs/log.20090121; should be hftp. hftp is
the scheme name of
HftpFileSystem which uses http for accessing hdfs.
Hope this helps.
Nicholas Sze
I thought hftp is used to talk to servlets
Is hadoop designed to run on homogeneous hardware only, or does it work just
as well on heterogeneous hardware as well? If the datanodes have different
disk capacities, does HDFS still spread the data blocks equally amount all
the datanodes, or will the datanodes with high disk capacity end up
Reminder - the Bay Area Hadoop User Group meeting is today at 6 pm.
From: Ajay Anand
Sent: Thursday, January 08, 2009 12:10 PM
To: 'core-user@hadoop.apache.org'; 'gene...@hadoop.apache.org';
'zookeeper-u...@hadoop.apache.org'; 'hbase-u...@hadoop.apache.org';
Derek Young wrote:
Reading http://issues.apache.org/jira/browse/HADOOP-341 it sounds like
this should be supported, but the http URLs are not working for me. Are
http source URLs still supported?
No. They used to be supported, but when distcp was converted to accept
any Path this stopped
Hello Hadoop Users,
I was hoping someone would be able to answer a question about node
decommissioning. I have a test Hadoop cluster set up which only consists of my
computer and a master node. I am looking at the removal and addition of nodes.
Adding a node is nearly instant (only about 5
Hey Alyssa,
If one of those datanodes down, a few minutes will pass when master discover
this phenomenon. Master node takes those nodes which have not send heatbeat
for quite a while as dead ones.
On Thu, Jan 22, 2009 at 8:34 AM, Hargraves, Alyssa aly...@wpi.edu wrote:
Hello Hadoop Users,
I
22 matches
Mail list logo