Re: server side DAG of jobs in YARN

2012-08-01 Thread Radim Kolar



I am working on the server side DAG execution actually. Here is a JIRA.

https://issues.apache.org/jira/browse/MAPREDUCE-4495

do you have some code in  hand?


Re: server side DAG of jobs in YARN

2012-08-01 Thread Bo Wang
Not working yet. I expect to finish it in one month or so.

On Tue, Jul 31, 2012 at 3:15 AM, Radim Kolar  wrote:

>
>  I am working on the server side DAG execution actually. Here is a JIRA.
>>
>> https://issues.apache.org/**jira/browse/MAPREDUCE-4495
>>
> do you have some code in  hand?
>


Reading fields from a Text line

2012-08-01 Thread Mohammad Tariq
Hello list,

   I have a flat file in which data is stored as lines of 107
bytes each. I need to skip the first 8 lines(as they don't contain any
valuable info). Thereafter, I have to read each line and extract the
information from them, but not the line as a whole. Each line is
composed of several fields without any delimiter between them. For
example, the first field is of 8 bytes, second of 2 bytes and so on. I
was trying to reach each line as a Text value, convert it into string
and using String.subring() method to extract the value of each field.
But it seems I am not doing  things in correct way. Need some
guidance. Many thanks.

Regards,
Mohammad Tariq


Issue with Hadoop Streaming

2012-08-01 Thread Devi Kumarappan
I am trying to run hadoop streaming using perl script as the mapper and with no 
reducer. My requirement is for the Mapper  to run on one file at a time.  since 
I have to do pattern processing in the entire contents of one file at a time 
and 
the file size is small.

Hadoop streaming manual suggests the following solution
* Generate a file containing the full HDFS path of the input files. 
Each map 
task would get one file name as input.
* Create a mapper script which, given a filename, will get the file to 
local 
disk, gzip the file and put it back in the desired output directory.
I am running the fllowing command.
hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar 
-input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 
/home/devi/Perl/crash_parser.pl" 

 
/user/devi/file.txt contains the following two lines.
/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

When this runs, instead of spawing two mappers for a.txt and b.txt as per the 
document, only one mapper is being spawned and the perl script gets the 
/user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.
 
How could I make the mapper perl script to run using only one file at a time ?
 
Appreciate your help, Thanks, Devi

Re: Reading fields from a Text line

2012-08-01 Thread Harsh J
Mohammad,

> But it seems I am not doing  things in correct way. Need some guidance.

What do you mean by the above? What is your written code exactly
expected to do and what is it not doing? Perhaps since you ask for a
code question here, can you share it with us (pastebin or gists,
etc.)?

For skipping 8 lines, if you are using splits, you need to detect
within the mapper or your record reader if the map task filesplit has
an offset of 0 and skip 8 line reads if so (Cause its the first split
of some file).

On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq  wrote:
> Hello list,
>
>I have a flat file in which data is stored as lines of 107
> bytes each. I need to skip the first 8 lines(as they don't contain any
> valuable info). Thereafter, I have to read each line and extract the
> information from them, but not the line as a whole. Each line is
> composed of several fields without any delimiter between them. For
> example, the first field is of 8 bytes, second of 2 bytes and so on. I
> was trying to reach each line as a Text value, convert it into string
> and using String.subring() method to extract the value of each field.
> But it seems I am not doing  things in correct way. Need some
> guidance. Many thanks.
>
> Regards,
> Mohammad Tariq



-- 
Harsh J


Re: Reading fields from a Text line

2012-08-01 Thread Sriram Ramachandrasekaran
Wouldn't it be better if you could skip those unwanted lines
upfront(preprocess) and have a file which is ready to be processed by the
MR system? In any case, more details are needed.

On Thu, Aug 2, 2012 at 8:23 AM, Harsh J  wrote:

> Mohammad,
>
> > But it seems I am not doing  things in correct way. Need some guidance.
>
> What do you mean by the above? What is your written code exactly
> expected to do and what is it not doing? Perhaps since you ask for a
> code question here, can you share it with us (pastebin or gists,
> etc.)?
>
> For skipping 8 lines, if you are using splits, you need to detect
> within the mapper or your record reader if the map task filesplit has
> an offset of 0 and skip 8 line reads if so (Cause its the first split
> of some file).
>
> On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq  wrote:
> > Hello list,
> >
> >I have a flat file in which data is stored as lines of 107
> > bytes each. I need to skip the first 8 lines(as they don't contain any
> > valuable info). Thereafter, I have to read each line and extract the
> > information from them, but not the line as a whole. Each line is
> > composed of several fields without any delimiter between them. For
> > example, the first field is of 8 bytes, second of 2 bytes and so on. I
> > was trying to reach each line as a Text value, convert it into string
> > and using String.subring() method to extract the value of each field.
> > But it seems I am not doing  things in correct way. Need some
> > guidance. Many thanks.
> >
> > Regards,
> > Mohammad Tariq
>
>
>
> --
> Harsh J
>



-- 
It's just about how deep your longing is!