Hello,
sorry my mistake. Problem solved.
On 06/19/2012 03:40 PM, Devaraj k wrote:
Can you share the exception stack trace and piece of code where you are trying
to create?
Thanks
Devaraj
From: Ondřej Klimpera [klimp...@fit.cvut.cz]
Sent: Tuesday
Hello, I'm tring to use MapFile (stored on HDFS) in my reduce task,
which processes some text data.
When I try to initialize MapFile.Reader in reducer configure() method,
app throws NullPointerException, when the same approach is used for each
reduce() method call with the same parameters, eve
lines per map task, or N wider distributed maps?
On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera wrote:
I tried this approach, but the job is not distributed among 10 mapper nodes.
Seems Hadoop ignores this property :(
My first thought is, that the small file size is the problem and Hadoop
doesn
, Bejoy KS wrote:
Hi Ondrej
You can use NLineInputFormat with n set to 10.
--Original Message------
From: Ondřej Klimpera
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Setting number of mappers according to number of TextInput lines
Sent: Jun 16, 2012 14:31
Hello,
I have very small input size (kB), but processing to produce some output
takes several minutes. Is there a way how to say, file has 100 lines, i
need 10 mappers, where each mapper node has to process 10 lines of input
file?
Thanks for advice.
Ondrej Klimpera
nt it only on 3-4 machines (or less),
instead of all, to avoid that?
On Thu, Jun 14, 2012 at 7:28 PM, Ondřej Klimpera wrote:
Hello,
you're right. That's exactly what I ment. And your answer is exactly what I
thought. I was just wondering if Hadoop can distribute the data to other
node's
have ~50 GB space available for
HDFS consumption (assuming replication = 3 for proper reliability).
On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera wrote:
Hello,
we're testing application on 8 nodes, where each node has 20GB of local
storage available. What we are trying to achieve is to g
a new packaging format, the Apache Hadoop 1.x has deprecated
the HADOOP_HOME env-var in favor of a new env-var called
'HADOOP_PREFIX'. You can set HADOOP_PREFIX, or set
HADOOP_HOME_WARN_SUPPRESS in your environment to a non-empty value to
suppress the warning.
On Thu, Jun 14, 2012 at 1
Hello,
we're testing application on 8 nodes, where each node has 20GB of local
storage available. What we are trying to achieve is to get more than
20GB to be processed on this cluster.
Is there a way how to distribute the data on the cluster?
There is also one shared NFS storage disk with 1
Hello, why when running Hadoop, there is always HADOOP_HOME shell
variable being told to be deprecated. How to set installation directory
on cluster nodes, which variable is correct.
Thanks
Ondrej Klimpera
Hello,
I'd like to ask you how Hadoop splits text input, if it's size is
smaller then HDFS block size.
I'm testing an application, which creates from small input large outputs.
When using NInputSplits input format and setting number of splits in
mapred-conf.xml some results are lost during w
support MultipleOutputs?
Thanks again.
On 04/30/2012 12:32 AM, Bill Graham wrote:
Take a look at the JobClient API. You can use that to get the current
progress of a running job.
On Sunday, April 29, 2012, Ondřej Klimpera wrote:
Hello I'd like to ask you what is the preferred way of ge
Hello I'd like to ask you what is the preferred way of getting running
jobs progress from Java application, that has executed them.
Im using Hadoop 0.20.203, tried job.end.notification.url property that
works well, but as the property name says, it sends only job end
notifications.
What I ne
Thanks, I'll try to implement it and get you know if it worked.
On 04/18/2012 04:07 PM, Harsh J wrote:
Since you're looking for per-line (and not per-task/file) monitoring,
this is best done by your own application code (a monitoring thread,
etc.).
On Wed, Apr 18, 2012 at 6:09
Hello, I'd like to ask you if there is a possibility of setting a
timeout for processing one input line of text input in mapper function.
The idea is, that if processing of one line is too long, Hadoop will cut
this process and continue processing next input line.
Thank you for your answer.
buted mode. Do let us know if it does not work as
intended.
On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera wrote:
Thanks for your advise, File.createTempFile() works great, at least in
pseudo-ditributed mode, hope cluster solution will do the same work. You
saved me hours of trying...
On
s would also be automatically deleted away after the task
attempt is done.
On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera wrote:
Hello,
I would like to ask you if it is possible to create and work with a
temporary file while in a map function.
I suppose that map function is running on a single
Hello,
I would like to ask you if it is possible to create and work with a
temporary file while in a map function.
I suppose that map function is running on a single node in Hadoop
cluster. So what is a safe way to create a temporary file and read from
it in one map() run. If it is possible
2 14:30, Ondřej Klimpera a scris:
And one more question, is it even possible to add a MapFile (as it
consits of index and data file) to Distributed cache?
Thanks
Should be no problem, they are just two files.
On 03/30/2012 01:15 PM, Ondřej Klimpera wrote:
Hello,
I'm not sure what you mean
And one more question, is it even possible to add a MapFile (as it
consits of index and data file) to Distributed cache?
Thanks
On 03/30/2012 01:15 PM, Ondřej Klimpera wrote:
Hello,
I'm not sure what you mean by using map reduce setup()?
"If the file is that small you could load
ugen Stan wrote:
Hello Ondrej,
Pe 29.03.2012 18:05, Ondřej Klimpera a scris:
Hello,
I have a MapFile as a product of MapReduce job, and what I need to do
is:
1. If MapReduce produced more spilts as Output, merge them to single
file.
2. Copy this merged MapFile to another HDFS location
Hello, I've got one more question, how is seek() (or get()) method
implemented in MapFile.Reader, does it use hashCode, compareTo() or
another mechanism to find a match in MapFile's index.
Thanks for your reply.
Ondrej Klimpera
On 03/29/2012 08:26 PM, Ondřej Klimpera wrote:
Thank
Thanks for your fast reply, I'll try this approach:)
On 03/29/2012 05:43 PM, Deniz Demir wrote:
Not sure if this helps in your use case but you can put all output file into
distributed cache and then access them in the subsequent map-reduce job (in
driver code):
// previous mr-job's o
Hello,
I have a MapFile as a product of MapReduce job, and what I need to do is:
1. If MapReduce produced more spilts as Output, merge them to single file.
2. Copy this merged MapFile to another HDFS location and use it as a
Distributed cache file for another MapReduce job.
I'm wondering if
aving
depricated things in my projects.
Thanks.
On 01/25/2012 01:46 PM, Ondřej Klimpera wrote:
I'm using 1.0.0 beta, suppose it was wrong decision to use beta
version. So do you recommend using 0.20.203.X and stick to Hadoop
definitive guide approaches?
Thanks for your reply
On 01/25/
ill recommend sticking to stable API if you are using a
0.20.x/1.x stable Apache release.
On Wed, Jan 25, 2012 at 5:13 PM, Ondřej Klimpera wrote:
Hello,
I'm trying to develop an application, where Reducer has to produce multiple
outputs.
In detail I need the Reducer to produce two types
Hello,
I'm trying to develop an application, where Reducer has to produce
multiple outputs.
In detail I need the Reducer to produce two types of files. Each file
will have different output.
I found in Hadoop, The Definitive Guide, that new API uses only
MultipleOutputs, but working with Mu
27 matches
Mail list logo