[android-developers] Re: How do I know what encoding the string when I decoded from the media store?

2011-05-14 Thread Bob Kerns
Java strings are *always* Unicode strings. If you have mojibake, then the 
problem occurred before you got a Java string, and you have to fix it there. 
You cannot in general fix the problem, once the string has been 
misinterpreted. Although sometimes you can get away with your getBytes/new 
String approach, it is not semantically correct, and may possibly fail in 
some circumstances.

What you need to do is to get your hands on the raw bytes as a byte[] array. 
 AlbumCursor.getBlob(1) would be the way to do that -- but if it's not a 
blob, it's implementation-defined what happens. Likewise, if it's not a 
string, it's implementation-defined what happens. It's documented as being 
TEXT.

It seems to me that your data is what's at fault here. What  you really need 
is some tool to repair it -- actually make new copies of your data, getting 
rid of the Shift-JIS or EUC-J or whatever it's encoded in.  SJIS -- and all 
national encodings -- are long obsolete, and you'll find increasing 
difficulty in handling them. Even the number of variations in SJIS is enough 
to give me nightmares. SJIS should not appear in MP3 files -- it violates 
the ID3v2 standard. ISO-8859-1 (ASCII), UTF-8, UCS-2, and UTF-16 (all 
Unicode) are the only options.

I don't know if there's any trick you can use to auto-detect the encoding, 
based on other information in the file. Certainly, there is no general 
solution.

I'd suggest writing a little tool that reads the MP3 file's ID3v2 data (not 
via MediaStore -- as raw data), shows you the textual data with various 
fixes (and lets you choose which one is actually readable), and then writes 
out the data to a new MP3 file. Run this on all your bad files that have 
been corrupted by whatever Japanese MP3 editing tool you've used in the 
past.

Before I went to that effort, though, I'd google for a tool that does this. 
 Surely someone has already done this.

You'd think that, after decades of dealing with JIS, SJIS variants, and 
EUC-J, and all the confusion and incompatibility that results, that people 
would not be stuffing SJIS into audio files. But alas

On Friday, May 13, 2011 10:56:49 AM UTC-7, wang wrote:

 Hi, 

 I have some Japanese music on device, I try to write program to get 
 every music's detail information from media store using 

 public static final String[] DataProjection = new String[]{ 
   MediaStore.Audio.Media.ALBUM_ID, 
  MediaStore.Audio.Albums.ALBUM, 
  MediaStore.Audio.Albums.ARTIST, 
  MediaStore.Audio.Media.DATA, 
 }; 

 but when I query the data from media store database, it give me string 
 with mojibakes (garbage characters), so I want to do convert before 
 media store scan the music file, Although I know how to convert, for 
 example SJIS to UTF-8 like 
 newStr = new 
 String(AlbumCursor.getString(1).getBytes(sjis),utf-8); 

 but I cant not know what encoding the string, so can not convert it to 
 UTF-8, someone can give me help or solution, thank you!!!

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en

[android-developers] Re: How do I know what encoding the string when I decoded from the media store?

2011-05-14 Thread Bob Kerns
By the way, you have it backwards, I think.

newStr = new String(AlbumCursor.getString(1).getBytes(utf-8), sjis);

That is, get the bytes of the string as they are internally, as UTF-8, and 
then re-interpret them as SJIS.

On Friday, May 13, 2011 10:56:49 AM UTC-7, wang wrote:

 Hi, 

 I have some Japanese music on device, I try to write program to get 
 every music's detail information from media store using 

 public static final String[] DataProjection = new String[]{ 
   MediaStore.Audio.Media.ALBUM_ID, 
  MediaStore.Audio.Albums.ALBUM, 
  MediaStore.Audio.Albums.ARTIST, 
  MediaStore.Audio.Media.DATA, 
 }; 

 but when I query the data from media store database, it give me string 
 with mojibakes (garbage characters), so I want to do convert before 
 media store scan the music file, Although I know how to convert, for 
 example SJIS to UTF-8 like 
 newStr = new 
 String(AlbumCursor.getString(1).getBytes(sjis),utf-8); 

 but I cant not know what encoding the string, so can not convert it to 
 UTF-8, someone can give me help or solution, thank you!!!

-- 
You received this message because you are subscribed to the Google
Groups Android Developers group.
To post to this group, send email to android-developers@googlegroups.com
To unsubscribe from this group, send email to
android-developers+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/android-developers?hl=en