The text class supports low level access to the underlying byte array in the
text object

You can call getbytes directly and then incrementally transcode the bytes
into characters using the charset encoder tools,
or call the charAt method to get the characters one by 1.
The bytesToCodePoint method provides a simpler interface for sequentially
working through the data.

On Thu, Oct 29, 2009 at 4:18 AM, bhushan_mahale <
bhushan_mah...@persistent.co.in> wrote:

> Hi,
>
> I am writing an M-R code using MapRunnable interface.
> The input format is SequenceFileInputFormat.
>
> Each Sequence-record contains a key-value pair of type <Text key,Text
> value> (Text: org.apache.hadoop.io.Text)
>
> The "key" Text object contains small string where as "value" Text object
> contains large XML string.
> "value" Text object can contain the data as large as 100 to 300 MB.
>
> I convert the "value" Text object to String using value.toString() method.
> It goes OutOfMemory for large data in "value" object.
>
> Is there any other way for converting large Text object to java String
> object?
> Alternatively, can I limit the number of records in RecordReader object
> coming to run method so that total memory utilization would be limited?
>
> Thanks,
> - Bhushan
>
>
> DISCLAIMER
> ==========
> This e-mail may contain privileged and confidential information which is
> the property of Persistent Systems Ltd. It is intended only for the use of
> the individual or entity to which it is addressed. If you are not the
> intended recipient, you are not authorized to read, retain, copy, print,
> distribute or use this message. If you have received this communication in
> error, please notify the sender and delete all copies of this message.
> Persistent Systems Ltd. does not accept any liability for virus infected
> mails.
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Reply via email to