Re: Creating MapFile.Reader instance in reducer setup

2012-06-19 Thread Ondřej Klimpera
Hello, sorry my mistake. Problem solved. On 06/19/2012 03:40 PM, Devaraj k wrote: Can you share the exception stack trace and piece of code where you are trying to create? Thanks Devaraj From: Ondřej Klimpera [klimp...@fit.cvut.cz] Sent: Tuesday

Creating MapFile.Reader instance in reducer setup

2012-06-19 Thread Ondřej Klimpera
Hello, I'm tring to use MapFile (stored on HDFS) in my reduce task, which processes some text data. When I try to initialize MapFile.Reader in reducer configure() method, app throws NullPointerException, when the same approach is used for each reduce() method call with the same parameters, eve

Re: Setting number of mappers according to number of TextInput lines

2012-06-17 Thread Ondřej Klimpera
lines per map task, or N wider distributed maps? On Sat, Jun 16, 2012 at 3:01 PM, Ondřej Klimpera wrote: I tried this approach, but the job is not distributed among 10 mapper nodes. Seems Hadoop ignores this property :( My first thought is, that the small file size is the problem and Hadoop doesn&#

Re: Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera
, Bejoy KS wrote: Hi Ondrej You can use NLineInputFormat with n set to 10. --Original Message------ From: Ondřej Klimpera To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Setting number of mappers according to number of TextInput lines Sent: Jun 16, 2012 14:31

Setting number of mappers according to number of TextInput lines

2012-06-16 Thread Ondřej Klimpera
Hello, I have very small input size (kB), but processing to produce some output takes several minutes. Is there a way how to say, file has 100 lines, i need 10 mappers, where each mapper node has to process 10 lines of input file? Thanks for advice. Ondrej Klimpera

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
nt it only on 3-4 machines (or less), instead of all, to avoid that? On Thu, Jun 14, 2012 at 7:28 PM, Ondřej Klimpera wrote: Hello, you're right. That's exactly what I ment. And your answer is exactly what I thought. I was just wondering if Hadoop can distribute the data to other node's

Re: Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
have ~50 GB space available for HDFS consumption (assuming replication = 3 for proper reliability). On Thu, Jun 14, 2012 at 1:25 PM, Ondřej Klimpera wrote: Hello, we're testing application on 8 nodes, where each node has 20GB of local storage available. What we are trying to achieve is to g

Re: HADOOP_HOME depracated

2012-06-14 Thread Ondřej Klimpera
a new packaging format, the Apache Hadoop 1.x has deprecated the HADOOP_HOME env-var in favor of a new env-var called 'HADOOP_PREFIX'. You can set HADOOP_PREFIX, or set HADOOP_HOME_WARN_SUPPRESS in your environment to a non-empty value to suppress the warning. On Thu, Jun 14, 2012 at 1

Dealing with low space cluster

2012-06-14 Thread Ondřej Klimpera
Hello, we're testing application on 8 nodes, where each node has 20GB of local storage available. What we are trying to achieve is to get more than 20GB to be processed on this cluster. Is there a way how to distribute the data on the cluster? There is also one shared NFS storage disk with 1

HADOOP_HOME depracated

2012-06-13 Thread Ondřej Klimpera
Hello, why when running Hadoop, there is always HADOOP_HOME shell variable being told to be deprecated. How to set installation directory on cluster nodes, which variable is correct. Thanks Ondrej Klimpera

How Hadoop splits TextInput?

2012-06-13 Thread Ondřej Klimpera
Hello, I'd like to ask you how Hadoop splits text input, if it's size is smaller then HDFS block size. I'm testing an application, which creates from small input large outputs. When using NInputSplits input format and setting number of splits in mapred-conf.xml some results are lost during w

Re: Getting job progress in java application

2012-04-30 Thread Ondřej Klimpera
support MultipleOutputs? Thanks again. On 04/30/2012 12:32 AM, Bill Graham wrote: Take a look at the JobClient API. You can use that to get the current progress of a running job. On Sunday, April 29, 2012, Ondřej Klimpera wrote: Hello I'd like to ask you what is the preferred way of ge

Getting job progress in java application

2012-04-29 Thread Ondřej Klimpera
Hello I'd like to ask you what is the preferred way of getting running jobs progress from Java application, that has executed them. Im using Hadoop 0.20.203, tried job.end.notification.url property that works well, but as the property name says, it sends only job end notifications. What I ne

Re: Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Thanks, I'll try to implement it and get you know if it worked. On 04/18/2012 04:07 PM, Harsh J wrote: Since you're looking for per-line (and not per-task/file) monitoring, this is best done by your own application code (a monitoring thread, etc.). On Wed, Apr 18, 2012 at 6:09

Setting a timeout for one Map() input processing

2012-04-18 Thread Ondřej Klimpera
Hello, I'd like to ask you if there is a possibility of setting a timeout for processing one input line of text input in mapper function. The idea is, that if processing of one line is too long, Hadoop will cut this process and continue processing next input line. Thank you for your answer.

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
buted mode. Do let us know if it does not work as intended. On Sun, Apr 8, 2012 at 11:40 PM, Ondřej Klimpera wrote: Thanks for your advise, File.createTempFile() works great, at least in pseudo-ditributed mode, hope cluster solution will do the same work. You saved me hours of trying... On

Re: Creating and working with temporary file in a map() function

2012-04-08 Thread Ondřej Klimpera
s would also be automatically deleted away after the task attempt is done. On Sun, Apr 8, 2012 at 2:14 AM, Ondřej Klimpera wrote: Hello, I would like to ask you if it is possible to create and work with a temporary file while in a map function. I suppose that map function is running on a single

Creating and working with temporary file in a map() function

2012-04-07 Thread Ondřej Klimpera
Hello, I would like to ask you if it is possible to create and work with a temporary file while in a map function. I suppose that map function is running on a single node in Hadoop cluster. So what is a safe way to create a temporary file and read from it in one map() run. If it is possible

Re: Working with MapFiles

2012-04-02 Thread Ondřej Klimpera
2 14:30, Ondřej Klimpera a scris: And one more question, is it even possible to add a MapFile (as it consits of index and data file) to Distributed cache? Thanks Should be no problem, they are just two files. On 03/30/2012 01:15 PM, Ondřej Klimpera wrote: Hello, I'm not sure what you mean

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
And one more question, is it even possible to add a MapFile (as it consits of index and data file) to Distributed cache? Thanks On 03/30/2012 01:15 PM, Ondřej Klimpera wrote: Hello, I'm not sure what you mean by using map reduce setup()? "If the file is that small you could load

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
ugen Stan wrote: Hello Ondrej, Pe 29.03.2012 18:05, Ondřej Klimpera a scris: Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS location

Re: Working with MapFiles

2012-03-30 Thread Ondřej Klimpera
Hello, I've got one more question, how is seek() (or get()) method implemented in MapFile.Reader, does it use hashCode, compareTo() or another mechanism to find a match in MapFile's index. Thanks for your reply. Ondrej Klimpera On 03/29/2012 08:26 PM, Ondřej Klimpera wrote: Thank

Re: Working with MapFiles

2012-03-29 Thread Ondřej Klimpera
Thanks for your fast reply, I'll try this approach:) On 03/29/2012 05:43 PM, Deniz Demir wrote: Not sure if this helps in your use case but you can put all output file into distributed cache and then access them in the subsequent map-reduce job (in driver code): // previous mr-job's o

Working with MapFiles

2012-03-29 Thread Ondřej Klimpera
Hello, I have a MapFile as a product of MapReduce job, and what I need to do is: 1. If MapReduce produced more spilts as Output, merge them to single file. 2. Copy this merged MapFile to another HDFS location and use it as a Distributed cache file for another MapReduce job. I'm wondering if

Re: Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
aving depricated things in my projects. Thanks. On 01/25/2012 01:46 PM, Ondřej Klimpera wrote: I'm using 1.0.0 beta, suppose it was wrong decision to use beta version. So do you recommend using 0.20.203.X and stick to Hadoop definitive guide approaches? Thanks for your reply On 01/25/

Re: Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
ill recommend sticking to stable API if you are using a 0.20.x/1.x stable Apache release. On Wed, Jan 25, 2012 at 5:13 PM, Ondřej Klimpera wrote: Hello, I'm trying to develop an application, where Reducer has to produce multiple outputs. In detail I need the Reducer to produce two types

Using MultipleOutputs with new API (v1.0)

2012-01-25 Thread Ondřej Klimpera
Hello, I'm trying to develop an application, where Reducer has to produce multiple outputs. In detail I need the Reducer to produce two types of files. Each file will have different output. I found in Hadoop, The Definitive Guide, that new API uses only MultipleOutputs, but working with Mu