subject:"Reading fields from a Text line"

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J

That is not really a bug. Only if you use @Override will you be really
asserting that you've overriden the right method (since new API uses
inheritance instead of interfaces). Without that kinda check, its easy
to make mistakes and add in methods that won't get considered by the
framework (and hence the default IdentityMapper comes into play).

Always use @Override annotations when inheriting and overriding methods.

On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks bejoy.had...@gmail.com wrote:
 Hi Tariq

 On further analysis I noticed a odd behavior in this context.

 If we use the default InputFormat (TextInputFormat) but specify the Key type
 in mapper as IntWritable instead of Long Writable. The framework is supposed
 throw a class cast exception.Such an exception is thrown only if the key
 types at class level and method level are the same (IntWritable) in Mapper.
 But if we provide the Input key type as IntWritable on the class level but
 LongWritable on the method level (map method), instead of throwing a compile
 time error, the code compliles fine . In addition to it on execution the
 framework triggers Identity Mapper instead of the custom mapper provided
 with the configuration.

 This seems like a bug to me . Filed a jira to track this issue
 https://issues.apache.org/jira/browse/MAPREDUCE-4507


 Regards
 Bejoy KS



-- 
Harsh J

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS

That is a good pointer Harsh.
Thanks a lot.

But if IdentityMapper is being used shouldn't the job.xml reflect that? But 
Job.xml always shows mapper as our CustomMapper.

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Harsh J ha...@cloudera.com
Date: Fri, 3 Aug 2012 13:02:32 
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Cc: Mohammad Tariqdonta...@gmail.com
Subject: Re: Reading fields from a Text line

That is not really a bug. Only if you use @Override will you be really
asserting that you've overriden the right method (since new API uses
inheritance instead of interfaces). Without that kinda check, its easy
to make mistakes and add in methods that won't get considered by the
framework (and hence the default IdentityMapper comes into play).

Always use @Override annotations when inheriting and overriding methods.

On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks bejoy.had...@gmail.com wrote:
 Hi Tariq

 On further analysis I noticed a odd behavior in this context.

 If we use the default InputFormat (TextInputFormat) but specify the Key type
 in mapper as IntWritable instead of Long Writable. The framework is supposed
 throw a class cast exception.Such an exception is thrown only if the key
 types at class level and method level are the same (IntWritable) in Mapper.
 But if we provide the Input key type as IntWritable on the class level but
 LongWritable on the method level (map method), instead of throwing a compile
 time error, the code compliles fine . In addition to it on execution the
 framework triggers Identity Mapper instead of the custom mapper provided
 with the configuration.

 This seems like a bug to me . Filed a jira to track this issue
 https://issues.apache.org/jira/browse/MAPREDUCE-4507


 Regards
 Bejoy KS



-- 
Harsh J

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J

Bejoy,

In the new API, the default map() function, if not properly
overridden, is the identity map function. There is no IdentityMapper
class in the new API, the Mapper class itself is identity by default.

On Fri, Aug 3, 2012 at 1:07 PM, Bejoy KS bejoy.had...@gmail.com wrote:
 That is a good pointer Harsh.
 Thanks a lot.

 But if IdentityMapper is being used shouldn't the job.xml reflect that? But 
 Job.xml always shows mapper as our CustomMapper.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Harsh J ha...@cloudera.com
 Date: Fri, 3 Aug 2012 13:02:32
 To: mapreduce-user@hadoop.apache.org
 Reply-To: mapreduce-user@hadoop.apache.org
 Cc: Mohammad Tariqdonta...@gmail.com
 Subject: Re: Reading fields from a Text line

 That is not really a bug. Only if you use @Override will you be really
 asserting that you've overriden the right method (since new API uses
 inheritance instead of interfaces). Without that kinda check, its easy
 to make mistakes and add in methods that won't get considered by the
 framework (and hence the default IdentityMapper comes into play).

 Always use @Override annotations when inheriting and overriding methods.

 On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks bejoy.had...@gmail.com wrote:
 Hi Tariq

 On further analysis I noticed a odd behavior in this context.

 If we use the default InputFormat (TextInputFormat) but specify the Key type
 in mapper as IntWritable instead of Long Writable. The framework is supposed
 throw a class cast exception.Such an exception is thrown only if the key
 types at class level and method level are the same (IntWritable) in Mapper.
 But if we provide the Input key type as IntWritable on the class level but
 LongWritable on the method level (map method), instead of throwing a compile
 time error, the code compliles fine . In addition to it on execution the
 framework triggers Identity Mapper instead of the custom mapper provided
 with the configuration.

 This seems like a bug to me . Filed a jira to track this issue
 https://issues.apache.org/jira/browse/MAPREDUCE-4507


 Regards
 Bejoy KS



 --
 Harsh J



-- 
Harsh J

Re: Reading fields from a Text line

2012-08-03 Thread Bejoy KS

Ok Got it now. That is a good piece of information.

Thank You :)

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Harsh J ha...@cloudera.com
Date: Fri, 3 Aug 2012 16:28:27 
To: mapreduce-user@hadoop.apache.org; bejoy.had...@gmail.com
Cc: Mohammad Tariqdonta...@gmail.com
Subject: Re: Reading fields from a Text line

Bejoy,

In the new API, the default map() function, if not properly
overridden, is the identity map function. There is no IdentityMapper
class in the new API, the Mapper class itself is identity by default.

On Fri, Aug 3, 2012 at 1:07 PM, Bejoy KS bejoy.had...@gmail.com wrote:
 That is a good pointer Harsh.
 Thanks a lot.

 But if IdentityMapper is being used shouldn't the job.xml reflect that? But 
 Job.xml always shows mapper as our CustomMapper.

 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Harsh J ha...@cloudera.com
 Date: Fri, 3 Aug 2012 13:02:32
 To: mapreduce-user@hadoop.apache.org
 Reply-To: mapreduce-user@hadoop.apache.org
 Cc: Mohammad Tariqdonta...@gmail.com
 Subject: Re: Reading fields from a Text line

 That is not really a bug. Only if you use @Override will you be really
 asserting that you've overriden the right method (since new API uses
 inheritance instead of interfaces). Without that kinda check, its easy
 to make mistakes and add in methods that won't get considered by the
 framework (and hence the default IdentityMapper comes into play).

 Always use @Override annotations when inheriting and overriding methods.

 On Fri, Aug 3, 2012 at 4:41 AM, Bejoy Ks bejoy.had...@gmail.com wrote:
 Hi Tariq

 On further analysis I noticed a odd behavior in this context.

 If we use the default InputFormat (TextInputFormat) but specify the Key type
 in mapper as IntWritable instead of Long Writable. The framework is supposed
 throw a class cast exception.Such an exception is thrown only if the key
 types at class level and method level are the same (IntWritable) in Mapper.
 But if we provide the Input key type as IntWritable on the class level but
 LongWritable on the method level (map method), instead of throwing a compile
 time error, the code compliles fine . In addition to it on execution the
 framework triggers Identity Mapper instead of the custom mapper provided
 with the configuration.

 This seems like a bug to me . Filed a jira to track this issue
 https://issues.apache.org/jira/browse/MAPREDUCE-4507


 Regards
 Bejoy KS



 --
 Harsh J



-- 
Harsh J

Re: Reading fields from a Text line

2012-08-02 Thread Mohammad Tariq

Thanks for the response Harsh n Sri. Actually, I was trying to prepare
a template for my application using which I was trying to read one
line at a time, extract the first field from it and emit that
extracted value from the mapper. I have these few lines of code for
that :

public static class XPTMapper extends MapperIntWritable, Text,
LongWritable, Text{

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{

Text word = new Text();
String line = value.toString();
if (!line.startsWith(TT)){
context.setStatus(INVALID 
LINE..SKIPPING);
}else{
String stdid = line.substring(0, 7);
word.set(stdid);
context.write(key, word);
}
}

But the output file contains all the rows of the input file including
the lines which I was expecting to get skipped. Also, I was expecting
only the fields I am emitting but the file contains entire lines.
Could you guys please point out the the mistake I might have made.
(Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

Regards,
Mohammad Tariq


On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
sri.ram...@gmail.com wrote:
 Wouldn't it be better if you could skip those unwanted lines
 upfront(preprocess) and have a file which is ready to be processed by the MR
 system? In any case, more details are needed.


 On Thu, Aug 2, 2012 at 8:23 AM, Harsh J ha...@cloudera.com wrote:

 Mohammad,

  But it seems I am not doing  things in correct way. Need some guidance.

 What do you mean by the above? What is your written code exactly
 expected to do and what is it not doing? Perhaps since you ask for a
 code question here, can you share it with us (pastebin or gists,
 etc.)?

 For skipping 8 lines, if you are using splits, you need to detect
 within the mapper or your record reader if the map task filesplit has
 an offset of 0 and skip 8 line reads if so (Cause its the first split
 of some file).

 On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com wrote:
  Hello list,
 
 I have a flat file in which data is stored as lines of 107
  bytes each. I need to skip the first 8 lines(as they don't contain any
  valuable info). Thereafter, I have to read each line and extract the
  information from them, but not the line as a whole. Each line is
  composed of several fields without any delimiter between them. For
  example, the first field is of 8 bytes, second of 2 bytes and so on. I
  was trying to reach each line as a Text value, convert it into string
  and using String.subring() method to extract the value of each field.
  But it seems I am not doing  things in correct way. Need some
  guidance. Many thanks.
 
  Regards,
  Mohammad Tariq



 --
 Harsh J




 --
 It's just about how deep your longing is!

Re: Reading fields from a Text line

2012-08-02 Thread Alok Kumar

Hi Tariq,

Is your file splittable? If it's not, Mapper will process entire file in
one go!
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#isSplitable%28org.apache.hadoop.mapreduce.JobContext,%20org.apache.hadoop.fs.Path%29

How many mappers being created? See if that helps.

Regards,
Alok

On Thu, Aug 2, 2012 at 3:48 PM, Mohammad Tariq donta...@gmail.com wrote:
 Thanks for the response Harsh n Sri. Actually, I was trying to prepare
 a template for my application using which I was trying to read one
 line at a time, extract the first field from it and emit that
 extracted value from the mapper. I have these few lines of code for
 that :

 public static class XPTMapper extends MapperIntWritable, Text,
 LongWritable, Text{

 public void map(LongWritable key, Text value, Context
context)
 throws IOException, InterruptedException{

 Text word = new Text();
 String line = value.toString();
 if (!line.startsWith(TT)){
 context.setStatus(INVALID
LINE..SKIPPING);
 }else{
 String stdid = line.substring(0, 7);
 word.set(stdid);
 context.write(key, word);
 }
 }

 But the output file contains all the rows of the input file including
 the lines which I was expecting to get skipped. Also, I was expecting
 only the fields I am emitting but the file contains entire lines.
 Could you guys please point out the the mistake I might have made.
 (Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

 Regards,
 Mohammad Tariq


 On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
 sri.ram...@gmail.com wrote:
 Wouldn't it be better if you could skip those unwanted lines
 upfront(preprocess) and have a file which is ready to be processed by
the MR
 system? In any case, more details are needed.


 On Thu, Aug 2, 2012 at 8:23 AM, Harsh J ha...@cloudera.com wrote:

 Mohammad,

  But it seems I am not doing  things in correct way. Need some
guidance.

 What do you mean by the above? What is your written code exactly
 expected to do and what is it not doing? Perhaps since you ask for a
 code question here, can you share it with us (pastebin or gists,
 etc.)?

 For skipping 8 lines, if you are using splits, you need to detect
 within the mapper or your record reader if the map task filesplit has
 an offset of 0 and skip 8 line reads if so (Cause its the first split
 of some file).

 On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com
wrote:
  Hello list,
 
 I have a flat file in which data is stored as lines of 107
  bytes each. I need to skip the first 8 lines(as they don't contain any
  valuable info). Thereafter, I have to read each line and extract the
  information from them, but not the line as a whole. Each line is
  composed of several fields without any delimiter between them. For
  example, the first field is of 8 bytes, second of 2 bytes and so on. I
  was trying to reach each line as a Text value, convert it into string
  and using String.subring() method to extract the value of each field.
  But it seems I am not doing  things in correct way. Need some
  guidance. Many thanks.
 
  Regards,
  Mohammad Tariq



 --
 Harsh J




 --
 It's just about how deep your longing is!




-- 
Alok Kumar

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy KS

Hi Tariq

I assume the mapper being used is IdentityMapper instead of XPTMapper class. 
Can you share your main class?

If you are using TextInputFormat an reading from a file in hdfs, it should have 
LongWritable Keys as input and your code has IntWritable as the input key type. 
Have a check on that as well.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: Mohammad Tariq donta...@gmail.com
Date: Thu, 2 Aug 2012 15:48:42 
To: mapreduce-user@hadoop.apache.org
Reply-To: mapreduce-user@hadoop.apache.org
Subject: Re: Reading fields from a Text line

Thanks for the response Harsh n Sri. Actually, I was trying to prepare
a template for my application using which I was trying to read one
line at a time, extract the first field from it and emit that
extracted value from the mapper. I have these few lines of code for
that :

public static class XPTMapper extends MapperIntWritable, Text,
LongWritable, Text{

public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException{

Text word = new Text();
String line = value.toString();
if (!line.startsWith(TT)){
context.setStatus(INVALID 
LINE..SKIPPING);
}else{
String stdid = line.substring(0, 7);
word.set(stdid);
context.write(key, word);
}
}

But the output file contains all the rows of the input file including
the lines which I was expecting to get skipped. Also, I was expecting
only the fields I am emitting but the file contains entire lines.
Could you guys please point out the the mistake I might have made.
(Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

Regards,
Mohammad Tariq


On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
sri.ram...@gmail.com wrote:
 Wouldn't it be better if you could skip those unwanted lines
 upfront(preprocess) and have a file which is ready to be processed by the MR
 system? In any case, more details are needed.


 On Thu, Aug 2, 2012 at 8:23 AM, Harsh J ha...@cloudera.com wrote:

 Mohammad,

  But it seems I am not doing  things in correct way. Need some guidance.

 What do you mean by the above? What is your written code exactly
 expected to do and what is it not doing? Perhaps since you ask for a
 code question here, can you share it with us (pastebin or gists,
 etc.)?

 For skipping 8 lines, if you are using splits, you need to detect
 within the mapper or your record reader if the map task filesplit has
 an offset of 0 and skip 8 line reads if so (Cause its the first split
 of some file).

 On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com wrote:
  Hello list,
 
 I have a flat file in which data is stored as lines of 107
  bytes each. I need to skip the first 8 lines(as they don't contain any
  valuable info). Thereafter, I have to read each line and extract the
  information from them, but not the line as a whole. Each line is
  composed of several fields without any delimiter between them. For
  example, the first field is of 8 bytes, second of 2 bytes and so on. I
  was trying to reach each line as a Text value, convert it into string
  and using String.subring() method to extract the value of each field.
  But it seems I am not doing  things in correct way. Need some
  guidance. Many thanks.
 
  Regards,
  Mohammad Tariq



 --
 Harsh J




 --
 It's just about how deep your longing is!

Re: Reading fields from a Text line

2012-08-02 Thread Mohammad Tariq

Thank you everyone. Here is the code from the driver :

Configuration conf = new Configuration();

conf.addResource(/home/cluster/hadoop-1.0.3/conf/core-site.xml);

conf.addResource(/home/cluster/hadoop-1.0.3/conf/hdfs-site.xml);
Job job = new Job(conf, XPTReader);
job.setJarByClass(XPTReader.class);
job.setMapperClass(XPTMapper.class);
job.setOutputKeyClass(LongWritable.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass(TextInputFormat.class);
Path inPath = new Path(/mapin/TX.xpt);
FileInputFormat.addInputPath(job, inPath);
FileOutputFormat.setOutputPath(job, new
Path(/mapout/+inPath.toString().split(/)[4]+java.util.Random.class.newInstance().nextInt()));
System.exit(job.waitForCompletion(true) ? 0 : 1);

Bejoy : I have observed one strange thing. When I am using
IntWritable, the output file contains the entire content of the input
file, but if I am using LongWritable, the output file is empty.

Sri, Code is working outside MR.

Regards,
Mohammad Tariq


On Thu, Aug 2, 2012 at 4:38 PM, Bejoy KS bejoy.had...@gmail.com wrote:
 Hi Tariq

 I assume the mapper being used is IdentityMapper instead of XPTMapper class. 
 Can you share your main class?

 If you are using TextInputFormat an reading from a file in hdfs, it should 
 have LongWritable Keys as input and your code has IntWritable as the input 
 key type. Have a check on that as well.


 Regards
 Bejoy KS

 Sent from handheld, please excuse typos.

 -Original Message-
 From: Mohammad Tariq donta...@gmail.com
 Date: Thu, 2 Aug 2012 15:48:42
 To: mapreduce-user@hadoop.apache.org
 Reply-To: mapreduce-user@hadoop.apache.org
 Subject: Re: Reading fields from a Text line

 Thanks for the response Harsh n Sri. Actually, I was trying to prepare
 a template for my application using which I was trying to read one
 line at a time, extract the first field from it and emit that
 extracted value from the mapper. I have these few lines of code for
 that :

 public static class XPTMapper extends MapperIntWritable, Text,
 LongWritable, Text{

 public void map(LongWritable key, Text value, Context context)
 throws IOException, InterruptedException{

 Text word = new Text();
 String line = value.toString();
 if (!line.startsWith(TT)){
 context.setStatus(INVALID 
 LINE..SKIPPING);
 }else{
 String stdid = line.substring(0, 7);
 word.set(stdid);
 context.write(key, word);
 }
 }

 But the output file contains all the rows of the input file including
 the lines which I was expecting to get skipped. Also, I was expecting
 only the fields I am emitting but the file contains entire lines.
 Could you guys please point out the the mistake I might have made.
 (Pardon my ignorance, as I am not very good at MapReduce).Many thanks.

 Regards,
 Mohammad Tariq


 On Thu, Aug 2, 2012 at 10:58 AM, Sriram Ramachandrasekaran
 sri.ram...@gmail.com wrote:
 Wouldn't it be better if you could skip those unwanted lines
 upfront(preprocess) and have a file which is ready to be processed by the MR
 system? In any case, more details are needed.


 On Thu, Aug 2, 2012 at 8:23 AM, Harsh J ha...@cloudera.com wrote:

 Mohammad,

  But it seems I am not doing  things in correct way. Need some guidance.

 What do you mean by the above? What is your written code exactly
 expected to do and what is it not doing? Perhaps since you ask for a
 code question here, can you share it with us (pastebin or gists,
 etc.)?

 For skipping 8 lines, if you are using splits, you need to detect
 within the mapper or your record reader if the map task filesplit has
 an offset of 0 and skip 8 line reads if so (Cause its the first split
 of some file).

 On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com wrote:
  Hello list,
 
 I have a flat file in which data is stored as lines of 107
  bytes each. I need to skip the first 8 lines(as they don't contain any
  valuable info). Thereafter, I have to read each line and extract the
  information from them, but not the line as a whole. Each line is
  composed of several fields without any delimiter between them. For
  example, the first field is of 8 bytes, second of 2 bytes and so on. I
  was trying to reach each line as a Text value, convert it into string
  and using String.subring() method to extract the value of each field.
  But it seems I am not doing  things in correct way. Need some
  guidance. Many thanks.
 
  Regards,
  Mohammad Tariq



 --
 Harsh J




 --
 It's just about how deep your longing is!

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy Ks

Hi Tariq

Again I strongly suspect the IdentityMapper in play here. The reasoning why
I suspect so is

When you have the whole data in output file it should be the Identity
Mapper. Due to the mismatch in input key type at class level and method
level the framework is falling back to IdentityMapper. I have noticed this
fall back while using new mapreduce API.
public static class XPTMapper extends Mapper*IntWritable*, Text,
LongWritable, Text{

public void map(*LongWritable* key, Text value, Context
context)
throws IOException, InterruptedException{


When you change the Input Key type to LongWritable in class level, it is
your custom mapper(XPTMapper) being called. Because of some exceptional
cases it is just going into if condition where you are not writing anything
out of Mapper and hence an empty output file.

public static class XPTMapper extends Mapper*LongWritable*, Text,
LongWritable, Text{

public void map(*LongWritable* key, Text value, Context
context)
throws IOException, InterruptedException{

To cross check this, try enabling some logging on your code to see exactly
what is happening.

By the way are you getting the output of this line in your logs when you
change the input key type to LongWritable?
context.setStatus(INVALID LINE..SKIPPING);
If so that confirms my assumption. :)

Try adding more logs to trace the flow and see what is going wrong. Or you
can use MRunit to unit test your code as the first step.

Hope it helps!..

Regards
Bejoy KS

Re: Reading fields from a Text line

2012-08-02 Thread Bejoy Ks

Hi Tariq

On further analysis I noticed a odd behavior in this context.

If we use the default InputFormat (TextInputFormat) but specify the Key
type in mapper as IntWritable instead of Long Writable. The framework is
supposed throw a class cast exception.Such an exception is thrown only if
the key types at class level and method level are the same (IntWritable) in
Mapper. But if we provide the Input key type as IntWritable on the class
level but LongWritable on the method level (map method), instead of
throwing a compile time error, the code compliles fine . In addition to it
on execution the framework triggers Identity Mapper instead of the custom
mapper provided with the configuration.

This seems like a bug to me . Filed a jira to track this issue
https://issues.apache.org/jira/browse/MAPREDUCE-4507


Regards
Bejoy KS

Reading fields from a Text line

2012-08-01 Thread Mohammad Tariq

Hello list,

   I have a flat file in which data is stored as lines of 107
bytes each. I need to skip the first 8 lines(as they don't contain any
valuable info). Thereafter, I have to read each line and extract the
information from them, but not the line as a whole. Each line is
composed of several fields without any delimiter between them. For
example, the first field is of 8 bytes, second of 2 bytes and so on. I
was trying to reach each line as a Text value, convert it into string
and using String.subring() method to extract the value of each field.
But it seems I am not doing  things in correct way. Need some
guidance. Many thanks.

Regards,
Mohammad Tariq

Re: Reading fields from a Text line

2012-08-01 Thread Harsh J

Mohammad,

 But it seems I am not doing  things in correct way. Need some guidance.

What do you mean by the above? What is your written code exactly
expected to do and what is it not doing? Perhaps since you ask for a
code question here, can you share it with us (pastebin or gists,
etc.)?

For skipping 8 lines, if you are using splits, you need to detect
within the mapper or your record reader if the map task filesplit has
an offset of 0 and skip 8 line reads if so (Cause its the first split
of some file).

On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com wrote:
 Hello list,

I have a flat file in which data is stored as lines of 107
 bytes each. I need to skip the first 8 lines(as they don't contain any
 valuable info). Thereafter, I have to read each line and extract the
 information from them, but not the line as a whole. Each line is
 composed of several fields without any delimiter between them. For
 example, the first field is of 8 bytes, second of 2 bytes and so on. I
 was trying to reach each line as a Text value, convert it into string
 and using String.subring() method to extract the value of each field.
 But it seems I am not doing  things in correct way. Need some
 guidance. Many thanks.

 Regards,
 Mohammad Tariq



-- 
Harsh J

Re: Reading fields from a Text line

2012-08-01 Thread Sriram Ramachandrasekaran

Wouldn't it be better if you could skip those unwanted lines
upfront(preprocess) and have a file which is ready to be processed by the
MR system? In any case, more details are needed.

On Thu, Aug 2, 2012 at 8:23 AM, Harsh J ha...@cloudera.com wrote:

 Mohammad,

  But it seems I am not doing  things in correct way. Need some guidance.

 What do you mean by the above? What is your written code exactly
 expected to do and what is it not doing? Perhaps since you ask for a
 code question here, can you share it with us (pastebin or gists,
 etc.)?

 For skipping 8 lines, if you are using splits, you need to detect
 within the mapper or your record reader if the map task filesplit has
 an offset of 0 and skip 8 line reads if so (Cause its the first split
 of some file).

 On Thu, Aug 2, 2012 at 1:54 AM, Mohammad Tariq donta...@gmail.com wrote:
  Hello list,
 
 I have a flat file in which data is stored as lines of 107
  bytes each. I need to skip the first 8 lines(as they don't contain any
  valuable info). Thereafter, I have to read each line and extract the
  information from them, but not the line as a whole. Each line is
  composed of several fields without any delimiter between them. For
  example, the first field is of 8 bytes, second of 2 bytes and so on. I
  was trying to reach each line as a Text value, convert it into string
  and using String.subring() method to extract the value of each field.
  But it seems I am not doing  things in correct way. Need some
  guidance. Many thanks.
 
  Regards,
  Mohammad Tariq



 --
 Harsh J




-- 
It's just about how deep your longing is!

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

Reading fields from a Text line

Re: Reading fields from a Text line

Re: Reading fields from a Text line

13 matches

Site Navigation

Mail list logo

Footer information