[android-developers] Re: Displaying unicode in a TextView?
AIIY You might want to check the documentation before you make such statements: http://developer.android.com/intl/de/reference/java/lang/String.html#String(byte[]) Do *NOT*, under *ANY* circumstance, omit that second argument. Just because it works, TODAY, on YOUR device, does not make this correct programming. Your conclusion, however, is correct - char-at-a-time conversion cannot work in all cases, and byte-at-a-time is even worse. You really do have to treat it as a byte stream. Just make sure you nail down the encoding of that byte stream. For more on this, see the last few items on my blog: http://bobkerns.typepad.com/bob_kerns_thinking/2010/12/yet-more-about-utf-8-the-evils-of-platform-defaults.html In addition to that article, note the subsequent link to the Oracle site. Among many other topics, that article discusses the new (as of Java 5) APIs to handle conversion of surrogate characters properly, and this is an area where char-at-a-time encoding fails. On Dec 1, 4:51 pm, HippoMan hippo.mail...@gmail.com wrote: I should clarify that I now don't need to do this: String content = new String(bytes, UTF-8); This is because java's default is unicode. I get the same result with or without the second argument to the String constructor. I now see that my original error resulted because I was converting to individual chars and appending them one-by-one, instead of dealing with a mass of bytes, which then could have been properly converted to unicode. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Ah, yes. I see that I just happened to luck out, as the file.encoding property on my device must be (currently!) set to utf-8. Thanks. This begs another, related question: how do I know what encoding to use, in the first place ... for a TextView in the Android environment? If I cannot count on the file.encoding property to always be set correctly when using a TextView in Android, where do I query for the correct encoding value? -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
PS: I am writing Android-specific code. The class I am using will never work outside of the Android environment, for reasons that go beyond the issue of character encoding. So does this mean that in my case, I _should_ do the moral equivalent of this? String content = new String(bytes, System.getProperty(file.encoding)); If so, this would indeed reduce to the following: String content = new String(bytes); -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
Re: [android-developers] Re: Displaying unicode in a TextView?
02.12.2010 13:47, HippoMan пишет: This begs another, related question: how do I know what encoding to use, in the first place ... for a TextView in the Android environment? You don't. This is not a TextView encoding issue. TextView works with Java strings, which are always Unicode. If I cannot count on the file.encoding property to always be set correctly when using a TextView in Android, where do I query for the correct encoding value? The issue arose when constructing a Java String from a byte array. As Bob pointed out, that's where you need to specify the encoding, (by always using the second argument to String(byte[], String)). The encoding used here needs to match the encoding that was used to construct the byte array (first argument). If those bytes came from a file, then you should somehow know the encoding. Some Unicode files start with a marker (0xFEFF or 0xFFFE), but checking for this marker isn't reliable and shouldn't be necessary. Here is some code that is a little more simple that using your own byte array: InputStream stream = . can be a file input stream InputStreamReader streamReader = new InputStreamReader(stream, UTF-8); BufferedReader textReader = new BufferedReader(streamReader ); // read from textReader -- Kostya Vasilyev -- WiFi Manager + pretty widget -- http://kmansoft.wordpress.com -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
Re: [android-developers] Re: Displaying unicode in a TextView?
02.12.2010 13:54, HippoMan пишет: PS: I am writing Android-specific code. The class I am using will never work outside of the Android environment, for reasons that go beyond the issue of character encoding. So does this mean that in my case, I _should_ do the moral equivalent of this? String content = new String(bytes, System.getProperty(file.encoding)); No, you shouldn't. file.encoding is a system-wide property, and if it matches *your application's* content, it's only by pure luck. These are your files, you should know what encoding they are in. If they are UTF-8, go ahead and specify that encoding in your code. A side note - perhaps every actual Android firmware sets file.encoding to UTF-8, but I don't see any guarantees to that in the SDK documentation. If so, this would indeed reduce to the following: String content = new String(bytes); -- Kostya Vasilyev -- WiFi Manager + pretty widget -- http://kmansoft.wordpress.com -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Yes. I should have written this after drinking my morning coffee. On my way to work, I woke up a little and remembered that this encoding pertains to the _source_ (in my case, the epub bundle) and not the _destination_ (the TextView). Luckily, I know something about the source: epubs are supposed to be encoded in UTF-8, so this will be the default encoding that I will use. I can give the user an option to change this for epub books that are encoded in a non-standard manner. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Yes. I should have written this after drinking my morning coffee. On my way to work, I woke up a little and remembered that this encoding pertains to the _source_ (in my case, the epub bundle) and not the _destination_ (the TextView). Luckily, I know something about the source: epubs are supposed to be encoded in UTF-8, so this will be the default encoding that I will use. I can give the user an option to change this for epub books that are encoded in a non-standard manner. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Thank you. I have checked the items that I am displaying, and all of them contain tags like this: meta http-equiv=Content-Type content=text/html; charset=utf-8/ In every case, the charset is specified as utf-8 or UTF-8. Apparently, this is not sufficient to cause the text to be interpreted as containing unicode. What can I do to ensure this? -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
Re: [android-developers] Re: Displaying unicode in a TextView?
It would help if you were more specific about the problem. 1) What Unicode character code are you setting in the TextView that displays as garbage? 2) What exactly does garbage mean, rectangles, or unexpected characters like upside down question marks, etc? 3) Please post the code you are using to get the contents over the network. I suspect the issue is converting a stream of bytes to characters. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Thank you very much. 1) I am not explicitly setting any unicode character code in the TextView. I display data that exists within e-books that are stored in epub format. I do not alter this data at all. I display it as is. Most of this displays OK, but quotes look like the following garbage ... 2) Open double quote: a-circumflex something similar to a Euro symbol a scrunched together o-e Close double quote: a-circumflex something similar to a Euro symbol question mark in a black diamond 3) I do not get the contents over the network. These are e-books in epub format that already reside on my sdcard. The mojibake is rendered perfectly (i.e., not as mojibake) by other e-readers such as Laputa, Aldiko, FBReader, Laputa, Nook, Kindle, etc. This is the code that reads the data from the epub file. // this.file is a String which contains the pathname to the // e-book on my sdcard. // this.itemMap is a LinkedHashMap which holds the contents // of each item in the e-book, keyed by the name of the item. // It's initialized to null and instantiated via lazy // evaluation. // More extensive error handling will be added later. private boolean readEpubFile() { FileInputStream f = null; ZipInputStream z = null; try { f = new FileInputStream(this.file); z = new ZipInputStream(f); ZipEntry ze; while ((ze = z.getNextEntry()) != null) { StringBuilder sb = new StringBuilder(); for (int c = z.read(); c = 0; c = z.read()) { sb.append((char) c); } String name = ze.getName(); if (this.itemMap == null) { this.itemMap = new LinkedHashMapString, String(); } this.itemMap.put(name, sb.toString()); } } catch (Throwable t) { return (false); } finally { try { z.close(); } catch (Throwable t) { } try { f.close(); } catch (Throwable t) { } } return (true); } -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
PS: I forgot to add that after I build this map, I just do the following to display the text within my TextView: // this.section is a String which holds the name of the // ebook section that I want to view. It must be a key // to the above-mentioned LinkedHashMap containing the // epub data. // this.view is the TextView object // this.text is a String this.text = this.itemMap.get(this.section); if (this.text != null) { this.view.setText(Html.fromHtml(this.text).toString()); } -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
OK. I figured it out after thinking more about what you said in your item 3. I need to convert the bytes that come out of the zip file into correct unicode. I changed the method as follows, and now it renders the characters properly: private boolean readEpubFile() { FileInputStream f = null; ZipInputStream z = null; byte[] buffer = new byte[65536]; try { f = new FileInputStream(this.file); z = new ZipInputStream(f); ZipEntry ze; while ((ze = z.getNextEntry()) != null) { StringBuilder sb = new StringBuilder(); int totlen = 0; int len= 0; while ((len = z.read(buffer)) 0) { sb.append(new String(buffer, 0, len, UTF-8)); totlen += len; } String name = ze.getName(); if (this.itemMap == null) { this.itemMap = new LinkedHashMapString, String(); } this.itemMap.put(name, sb.toString()); } } catch (Throwable t) { return (false); } finally { try { z.close(); } catch (Throwable t) { } try { f.close(); } catch (Throwable t) { } } return (true); } Thanks again to all of you for your help. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
... but I should actually use a ByteArrayOutputStream to avoid breaking up unicode characters that might span the 65536-byte boundary of my input buffer: private boolean readEpubFile() { FileInputStream f= null; ZipInputStream z= null; bytebuffer[] = new byte[65536]; try { f = new FileInputStream(this.file); z = new ZipInputStream(f); ZipEntry ze; while ((ze = z.getNextEntry()) != null) { ByteArrayOutputStream bs = new ByteArrayOutputStream(); int len = 0; while ((len = z.read(buffer)) 0) { bs.write(buffer, 0, len); } byte[] bytes = bs.toByteArray(); if (this.itemMap == null) { this.itemMap = new LinkedHashMapString, String(); } String name= ze.getName(); String content = new String(bytes); this.itemMap.put(name, content); } } catch (Throwable t) { return (false); } return (true); } -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
I should clarify that I now don't need to do this: String content = new String(bytes, UTF-8); This is because java's default is unicode. I get the same result with or without the second argument to the String constructor. I now see that my original error resulted because I was converting to individual chars and appending them one-by-one, instead of dealing with a mass of bytes, which then could have been properly converted to unicode. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
Thanks. I'll check the content variable later today or tomorrow and post back here. In the mean time, I'm wondering if perhaps this isn't a unicode issue, after all. Upon closer examination, it seems that the only characters that appear as garbage in the data I'm examining are quote characters and apostrophes. I vaguely remember seeing this problem in the past with some Microsoft character sets that might not actually be unicode. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en
[android-developers] Re: Displaying unicode in a TextView?
It is not anything specifically to do with Microsoft:. http://en.wikipedia.org/wiki/Mojibake Double quote marks and apostrophes are frequent offenders. This is because ASCII contains only a single character for each, but different characters are used for the start and end quote marks. Short and long hyphens are another place where ASCII has only a single code for two different characters. You need to somehow specify that the HTML string is using UTF8. -- You received this message because you are subscribed to the Google Groups Android Developers group. To post to this group, send email to android-developers@googlegroups.com To unsubscribe from this group, send email to android-developers+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/android-developers?hl=en