[android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread Bob Kerns
AIIY

You might want to check the documentation before you make such
statements: 
http://developer.android.com/intl/de/reference/java/lang/String.html#String(byte[])

Do *NOT*, under *ANY* circumstance, omit that second argument.

Just because it works, TODAY, on YOUR device, does not make this
correct programming.

Your conclusion, however, is correct - char-at-a-time conversion
cannot work in all cases, and byte-at-a-time is even worse. You really
do have to treat it as a byte stream. Just make sure you nail down the
encoding of that byte stream.

For more on this, see the last few items on my blog:

http://bobkerns.typepad.com/bob_kerns_thinking/2010/12/yet-more-about-utf-8-the-evils-of-platform-defaults.html

In addition to that article, note the subsequent link to the Oracle
site. Among many other topics, that article discusses the new (as of
Java 5) APIs to handle conversion of surrogate characters properly,
and this is an area where char-at-a-time encoding fails.

On Dec 1, 4:51 pm, HippoMan hippo.mail...@gmail.com wrote:
 I should clarify that I now don't need to do this:

     String content = new String(bytes, UTF-8);

 This is because java's default is unicode. I get the same result with
 or without the second argument to the String constructor.

 I now see that my original error resulted because I was converting to
 individual chars and appending them one-by-one, instead of dealing
 with a mass of bytes, which then could have been properly converted to
 unicode.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread HippoMan
Ah, yes. I see that I just happened to luck out, as the file.encoding
property on my device must be (currently!) set to utf-8.

Thanks.

This begs another, related question: how do I know what encoding to
use, in the first place ... for a TextView in the Android environment?

If I cannot count on the file.encoding property to always be set
correctly when using a TextView in Android, where do I query for the
correct encoding value?

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread HippoMan
PS: I am writing Android-specific code. The class I am using will
never work outside of the Android environment, for reasons that go
beyond the issue of character encoding.

So does this mean that in my case, I _should_ do the moral equivalent
of this?

String content = new String(bytes,
System.getProperty(file.encoding));

If so, this would indeed reduce to the following:

String content = new String(bytes);

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


Re: [android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread Kostya Vasilyev

02.12.2010 13:47, HippoMan пишет:

This begs another, related question: how do I know what encoding to
use, in the first place ... for a TextView in the Android environment?



You don't. This is not a TextView encoding issue.

TextView works with Java strings, which are always Unicode.


If I cannot count on the file.encoding property to always be set
correctly when using a TextView in Android, where do I query for the
correct encoding value?


The issue arose when constructing a Java String from a byte array.

As Bob pointed out, that's where you need to specify the encoding, (by 
always using the second argument to String(byte[], String)).


The encoding used here needs to match the encoding that was used to 
construct the byte array (first argument).


If those bytes came from a file, then you should somehow know the encoding.

Some Unicode files start with a marker (0xFEFF or 0xFFFE), but checking 
for this marker isn't reliable and shouldn't be necessary.


Here is some code that is a little more simple that using your own byte 
array:


InputStream stream = . can be a file input stream
InputStreamReader streamReader = new InputStreamReader(stream, UTF-8);
BufferedReader textReader = new BufferedReader(streamReader );

// read from textReader

--
Kostya Vasilyev -- WiFi Manager + pretty widget -- http://kmansoft.wordpress.com

--
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


Re: [android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread Kostya Vasilyev

02.12.2010 13:54, HippoMan пишет:

PS: I am writing Android-specific code. The class I am using will
never work outside of the Android environment, for reasons that go
beyond the issue of character encoding.

So does this mean that in my case, I _should_ do the moral equivalent
of this?

 String content = new String(bytes,
 System.getProperty(file.encoding));



No, you shouldn't.

file.encoding is a system-wide property, and if it matches *your 
application's* content, it's only by pure luck.


These are your files, you should know what encoding they are in.

If they are UTF-8, go ahead and specify that encoding in your code.

A side note - perhaps every actual Android firmware sets file.encoding 
to UTF-8, but I don't see any guarantees to that in the SDK documentation.



If so, this would indeed reduce to the following:

 String content = new String(bytes);




--
Kostya Vasilyev -- WiFi Manager + pretty widget -- http://kmansoft.wordpress.com

--
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread HippoMan
Yes. I should have written this after drinking my morning coffee. On
my way to work, I woke up a little and remembered that this encoding
pertains to the _source_ (in my case, the epub bundle) and not the
_destination_ (the TextView).

Luckily, I know something about the source: epubs are supposed to be
encoded in UTF-8, so this will be the default encoding that I will
use. I can give the user an option to change this for epub books that
are encoded in a non-standard manner.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-02 Thread HippoMan
Yes. I should have written this after drinking my morning coffee. On
my way to work, I woke up a little and remembered that this encoding
pertains to the _source_ (in my case, the epub bundle) and not the
_destination_ (the TextView).

Luckily, I know something about the source: epubs are supposed to be
encoded in UTF-8, so this will be the default encoding that I will
use. I can give the user an option to change this for epub books that
are encoded in a non-standard manner.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
Thank you.

I have checked the items that I am displaying, and all of them contain
tags like this:

  meta http-equiv=Content-Type content=text/html; charset=utf-8/

In every case, the charset is specified as utf-8 or UTF-8.

Apparently, this is not sufficient to cause the text to be interpreted
as containing unicode. What can I do to ensure this?

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


Re: [android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread Frank Weiss
It would help if you were more specific about the problem.

1) What Unicode character code are you setting in the TextView that displays
as garbage?
2) What exactly does garbage mean, rectangles, or unexpected characters like
upside down question marks, etc?
3) Please post the code you are using to get the contents over the network.
I suspect the issue is converting a stream of bytes to characters.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
Thank you very much.

1) I am not explicitly setting any unicode character code in the
TextView. I display data that exists within e-books that are stored in
epub format. I do not alter this data at all. I display it as is. Most
of this displays OK, but quotes look like the following garbage ...

2) Open double quote:   a-circumflex
something similar to a Euro symbol
a scrunched together o-e

   Close double quote:  a-circumflex
something similar to a Euro symbol
question mark in a black diamond

3) I do not get the contents over the network. These are e-books in
epub format that already reside on my sdcard. The mojibake is rendered
perfectly (i.e., not as mojibake) by other e-readers such as Laputa,
Aldiko, FBReader, Laputa, Nook, Kindle, etc.

This is the code that reads the data from the epub file.

// this.file is a String which contains the pathname to the
// e-book on my sdcard.

// this.itemMap is a LinkedHashMap which holds the contents
// of each item in the e-book, keyed by the name of the item.
// It's initialized to null and instantiated via lazy
// evaluation.

// More extensive error handling will be added later.

private boolean readEpubFile() {
FileInputStream f = null;
ZipInputStream  z = null;
try {
f = new FileInputStream(this.file);
z = new ZipInputStream(f);
ZipEntry ze;
while ((ze = z.getNextEntry()) != null) {
StringBuilder sb = new StringBuilder();
for (int c = z.read(); c = 0; c = z.read()) {
sb.append((char) c);
}
String name = ze.getName();
if (this.itemMap == null) {
this.itemMap = new LinkedHashMapString,
String();
}
this.itemMap.put(name, sb.toString());
}
}
catch (Throwable t) {
return (false);
}
finally {
try {
z.close();
}
catch (Throwable t) {
}
try {
f.close();
}
catch (Throwable t) {
}
}
return (true);
}

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
PS: I forgot to add that after I build this map, I just do the
following to display the text within my TextView:

// this.section is a String which holds the name of the
// ebook section that I want to view. It must be a key
// to the above-mentioned LinkedHashMap containing the
// epub data.

// this.view is the TextView object

// this.text is a String

this.text = this.itemMap.get(this.section);
if (this.text != null) {
this.view.setText(Html.fromHtml(this.text).toString());
}

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
OK. I figured it out after thinking more about what you said in your
item 3. I need to convert the bytes that come out of the zip file into
correct unicode. I changed the method as follows, and now it renders
the characters properly:

private boolean readEpubFile() {
FileInputStream f  = null;
ZipInputStream  z  = null;
byte[]  buffer = new byte[65536];
try {
f = new FileInputStream(this.file);
z = new ZipInputStream(f);
ZipEntry ze;
while ((ze = z.getNextEntry()) != null) {
StringBuilder sb = new StringBuilder();
int totlen = 0;
int len= 0;
while ((len = z.read(buffer))  0) {
sb.append(new String(buffer, 0, len, UTF-8));
totlen += len;
}
String name = ze.getName();
if (this.itemMap == null) {
this.itemMap = new LinkedHashMapString,
String();
}
this.itemMap.put(name, sb.toString());
}
}
catch (Throwable t) {
return (false);
}
finally {
try {
z.close();
}
catch (Throwable t) {
}
try {
f.close();
}
catch (Throwable t) {
}
}
return (true);
}


Thanks again to all of you for your help.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
... but I should actually use a ByteArrayOutputStream to avoid
breaking up unicode characters that might span the 65536-byte boundary
of my input buffer:

private boolean readEpubFile() {
FileInputStream f= null;
ZipInputStream  z= null;
bytebuffer[] = new byte[65536];
try {
f = new FileInputStream(this.file);
z = new ZipInputStream(f);
ZipEntry ze;
while ((ze = z.getNextEntry()) != null) {
ByteArrayOutputStream bs = new
ByteArrayOutputStream();
int len = 0;
while ((len = z.read(buffer))  0) {
bs.write(buffer, 0, len);
}
byte[] bytes = bs.toByteArray();
if (this.itemMap == null) {
this.itemMap = new LinkedHashMapString,
String();
}
String name= ze.getName();
String content = new String(bytes);
this.itemMap.put(name, content);
}
}
catch (Throwable t) {
return (false);
}
return (true);
}

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-12-01 Thread HippoMan
I should clarify that I now don't need to do this:

String content = new String(bytes, UTF-8);

This is because java's default is unicode. I get the same result with
or without the second argument to the String constructor.

I now see that my original error resulted because I was converting to
individual chars and appending them one-by-one, instead of dealing
with a mass of bytes, which then could have been properly converted to
unicode.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-11-30 Thread HippoMan
Thanks. I'll check the content variable later today or tomorrow and
post back here.

In the mean time, I'm wondering if perhaps this isn't a unicode issue,
after all. Upon closer examination, it seems that the only characters
that appear as garbage in the data I'm examining are quote characters
and apostrophes. I vaguely remember seeing this problem in the past
with some Microsoft character sets that might not actually be unicode.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en


[android-developers] Re: Displaying unicode in a TextView?

2010-11-30 Thread Peter Webb
It is not anything specifically to do with Microsoft:.

http://en.wikipedia.org/wiki/Mojibake

Double quote marks and apostrophes are frequent offenders. This is
because ASCII contains only a single character for each, but different
characters are used for the start and end quote marks. Short and long
hyphens are another place where ASCII has only a single code for two
different characters.

You need to somehow specify that the HTML string is using UTF8.

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en