Hi, This is something we can look in to. I believe we already do something similar with filenames that are discovered via a list operation in a recursive/wildcard transfer. If you do a 'globus-url-copy gsiftp://hostname/path/start-of-utf8-filename*', it may work. Can you open an enhancement bug report under GridFTP at http://bugzilla.globus.org?
Thanks, Mike Hai-Ning Wu wrote: > Hello, > > Since gridftp is developed according to RFC, I found out that such > encoding problems can be easily removed by converting the input strings > into percent-encoding. (http://en.wikipedia.org/wiki/Percent-encoding) > Thus I wrote a function that converts the non-ascii characters to > percent-encoding. Once the "globus-url-copy" calls this function before > going deeper, it is able to handle utf8 strings. > I am just wondering if the gridftp developers can take this into > consideration because there are only a few lines of code which makes the > multilingual world much easier. > > Below is my code: > char * > globus_l_guc_convert_utf8_url(char * origin_url) > { > char * ascii_only_url; > char hex[17] = "0123456789ABCDEF"; > int pos1, pos2; > > ascii_only_url = (char *) malloc(strlen(origin_url) * 2 * sizeof(char)); > pos1 = pos2 = 0; > while (origin_url[pos1] != '\0') { > if (origin_url[pos1] >= 0) > ascii_only_url[pos2++] = origin_url[pos1++]; > else { > ascii_only_url[pos2++] = '%'; > ascii_only_url[pos2++] = hex[(unsigned > char)origin_url[pos1]/16]; > ascii_only_url[pos2++] = hex[(unsigned > char)origin_url[pos1]%16]; > pos1++; > } > } > ascii_only_url[pos2] = '\0'; > return ascii_only_url; > } > > > - Hai-Ning > > ----- Original Message ----- From: "Dan Gunter" <[EMAIL PROTECTED]> > To: "Hai-Ning Wu" <[EMAIL PROTECTED]> > Cc: <[email protected]> > Sent: Tuesday, May 06, 2008 9:29 PM > Subject: Re: [gt-user] Unable to transfer file names encoded in UTF8 > using GridFTP > > >> The reason gridftp is picky is that URLs can only have US-ASCII >> characters in them, and it doesn't want to do the encoding for you >> because, I assume, that would be a fair amount of work. See RFC 1738, >> 1808, and 2396 for details if you are interested in tackling this >> yourself. One approach may be to simply wrap the guc command with an >> encoder. >> >> -Dan >> >> Hai-Ning Wu wrote: >>> Hello, >>> >>> I tried to transfer files with Chinese file names using >>> "globus-url-copy" but failed to do so. The error message is "error: >>> [globus_gass_copy_get_url_mode]: globus_url_parse returned error code: >>> -8 for url: <my file path>" >>> >>> To see which part went wrong, I traced the source code of >>> globus-url-copy and, finally, I found out that the problem came from >>> $gt_home/source-trees/common/source/library/globus_url.c. >>> >>> This is the a small piece of the code from globus_url_get_path() where >>> the problem occurs: >>> if(isalnum((*stringp)[pos]) || >>> globusl_url_issafe((*stringp)[pos]) || >>> globusl_url_isextra((*stringp)[pos]) || >>> globusl_url_isscheme_special((*stringp)[pos]) || >>> (*stringp)[pos] == '~' || /* incorrect, but de facto */ >>> (*stringp)[pos] == '/'|| >>> (*stringp)[pos] == ' ') /* to be nice */ >>> { >>> pos++; >>> } >>> >>> The function "globus_url_get_path()" checks the validity of the path >>> before retrieving its substring. It only accepts ASCII characters and >>> omits any other characters. However, since Chinese characters are >>> encoded in UTF-8 and most UTF-8 characterss are begin with a "1" as >>> their leading bits. This is why Chinese file names did not work with >>> globus-url-copy. >>> >>> I cannot understand the exact function of the code above. I mean it >>> seems ok to work with characters other than ASCII codes. So I am just >>> wondering if it is appropriate to let that function accept them, in >>> order to accept UTF-8 strings. >>> >>> By the way, I think it is important to make grid middlewares like >>> globus to support multiple languages since grid computing requires >>> global cooperation. For example, if developers consider not just ASCII >>> code or program in unicode, the life would have been much easier. >>> However, as far as I have experienced, most programs are lack of >>> multi-language features. >>> >>> Any comments would be helpful. Thanks. >>> >>> Hai-Ning >>> >>> -- >>> Hai-Ning Wu >>> Academia Sinica Grid Computing >>> Taipei, Taiwan >>> Email: [EMAIL PROTECTED] >>> >> >> >> -- >> Dan Gunter. voice:510-495-2504 fax:510-486-6363 dsd.lbl.gov/~dang >> >> >
