Java strings are *always* Unicode strings. If you have mojibake, then the
problem occurred before you got a Java string, and you have to fix it there.
You cannot in general fix the problem, once the string has been
misinterpreted. Although sometimes you can get away with your getBytes/new
String approach, it is not semantically correct, and may possibly fail in
some circumstances.
What you need to do is to get your hands on the raw bytes as a byte[] array.
AlbumCursor.getBlob(1) would be the way to do that -- but if it's not a
blob, it's implementation-defined what happens. Likewise, if it's not a
string, it's implementation-defined what happens. It's documented as being
TEXT.
It seems to me that your data is what's at fault here. What you really need
is some tool to repair it -- actually make new copies of your data, getting
rid of the Shift-JIS or EUC-J or whatever it's encoded in. SJIS -- and all
national encodings -- are long obsolete, and you'll find increasing
difficulty in handling them. Even the number of variations in SJIS is enough
to give me nightmares. SJIS should not appear in MP3 files -- it violates
the ID3v2 standard. ISO-8859-1 (ASCII), UTF-8, UCS-2, and UTF-16 (all
Unicode) are the only options.
I don't know if there's any trick you can use to auto-detect the encoding,
based on other information in the file. Certainly, there is no general
solution.
I'd suggest writing a little tool that reads the MP3 file's ID3v2 data (not
via MediaStore -- as raw data), shows you the textual data with various
fixes (and lets you choose which one is actually readable), and then writes
out the data to a new MP3 file. Run this on all your bad files that have
been corrupted by whatever Japanese MP3 editing tool you've used in the
past.
Before I went to that effort, though, I'd google for a tool that does this.
Surely someone has already done this.
You'd think that, after decades of dealing with JIS, SJIS variants, and
EUC-J, and all the confusion and incompatibility that results, that people
would not be stuffing SJIS into audio files. But alas
On Friday, May 13, 2011 10:56:49 AM UTC-7, wang wrote:
Hi,
I have some Japanese music on device, I try to write program to get
every music's detail information from media store using
public static final String[] DataProjection = new String[]{
MediaStore.Audio.Media.ALBUM_ID,
MediaStore.Audio.Albums.ALBUM,
MediaStore.Audio.Albums.ARTIST,
MediaStore.Audio.Media.DATA,
};
but when I query the data from media store database, it give me string
with mojibakes (garbage characters), so I want to do convert before
media store scan the music file, Although I know how to convert, for
example SJIS to UTF-8 like
newStr = new
String(AlbumCursor.getString(1).getBytes(sjis),utf-8);
but I cant not know what encoding the string, so can not convert it to
UTF-8, someone can give me help or solution, thank you!!!
--
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en