Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Wed, Sep 25, 2013 at 5:01 AM, Matt Fleming wrote: > On Sun, 22 Sep, at 05:24:28PM, H. Peter Anvin wrote: >> On 09/22/2013 04:07 PM, Roy Franz wrote: >> > On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin wrote: >> >> Sorry this version is broken and doesn't even compile due to remaining >> >> options_size references. >> > >> > I compiled and tested this series on both x86_64 (using OVMF) and on >> > the ARM simulator. I just doubled checked >> > my kernel .config to verify this was not being omitted and I'm pretty >> > sure this doesn't have any compilation problems. >> > I did make a few changes to get the untested version you sent out to >> > compile, but they all seemed to be straightforward typo type fixes. >> > I'll gladly address any defects in this patch, but I don't see an >> > compilation problems. >> > >> >> Ah yes, I see now... you fixed up the compile problem but did so >> incorrectly. > > Folks, I'm gonna drop this patch for now. Feel free to resend it once > everyone's happy with it. There's plenty of time to get this patch > applied, it just doesn't make sense to hold up the rest of the series. I'll get an updated (and independent) version out in the next couple days that addresses HPA's feedback. Thanks, Roy > > -- > Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Sun, 22 Sep, at 05:24:28PM, H. Peter Anvin wrote: > On 09/22/2013 04:07 PM, Roy Franz wrote: > > On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin wrote: > >> Sorry this version is broken and doesn't even compile due to remaining > >> options_size references. > > > > I compiled and tested this series on both x86_64 (using OVMF) and on > > the ARM simulator. I just doubled checked > > my kernel .config to verify this was not being omitted and I'm pretty > > sure this doesn't have any compilation problems. > > I did make a few changes to get the untested version you sent out to > > compile, but they all seemed to be straightforward typo type fixes. > > I'll gladly address any defects in this patch, but I don't see an > > compilation problems. > > > > Ah yes, I see now... you fixed up the compile problem but did so > incorrectly. Folks, I'm gonna drop this patch for now. Feel free to resend it once everyone's happy with it. There's plenty of time to get this patch applied, it just doesn't make sense to hold up the rest of the series. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Sun, 22 Sep, at 05:24:28PM, H. Peter Anvin wrote: On 09/22/2013 04:07 PM, Roy Franz wrote: On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin h...@zytor.com wrote: Sorry this version is broken and doesn't even compile due to remaining options_size references. I compiled and tested this series on both x86_64 (using OVMF) and on the ARM simulator. I just doubled checked my kernel .config to verify this was not being omitted and I'm pretty sure this doesn't have any compilation problems. I did make a few changes to get the untested version you sent out to compile, but they all seemed to be straightforward typo type fixes. I'll gladly address any defects in this patch, but I don't see an compilation problems. Ah yes, I see now... you fixed up the compile problem but did so incorrectly. Folks, I'm gonna drop this patch for now. Feel free to resend it once everyone's happy with it. There's plenty of time to get this patch applied, it just doesn't make sense to hold up the rest of the series. -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Wed, Sep 25, 2013 at 5:01 AM, Matt Fleming m...@console-pimps.org wrote: On Sun, 22 Sep, at 05:24:28PM, H. Peter Anvin wrote: On 09/22/2013 04:07 PM, Roy Franz wrote: On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin h...@zytor.com wrote: Sorry this version is broken and doesn't even compile due to remaining options_size references. I compiled and tested this series on both x86_64 (using OVMF) and on the ARM simulator. I just doubled checked my kernel .config to verify this was not being omitted and I'm pretty sure this doesn't have any compilation problems. I did make a few changes to get the untested version you sent out to compile, but they all seemed to be straightforward typo type fixes. I'll gladly address any defects in this patch, but I don't see an compilation problems. Ah yes, I see now... you fixed up the compile problem but did so incorrectly. Folks, I'm gonna drop this patch for now. Feel free to resend it once everyone's happy with it. There's plenty of time to get this patch applied, it just doesn't make sense to hold up the rest of the series. I'll get an updated (and independent) version out in the next couple days that addresses HPA's feedback. Thanks, Roy -- Matt Fleming, Intel Open Source Technology Center -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On 09/22/2013 04:07 PM, Roy Franz wrote: > On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin wrote: >> Sorry this version is broken and doesn't even compile due to remaining >> options_size references. > > I compiled and tested this series on both x86_64 (using OVMF) and on > the ARM simulator. I just doubled checked > my kernel .config to verify this was not being omitted and I'm pretty > sure this doesn't have any compilation problems. > I did make a few changes to get the untested version you sent out to > compile, but they all seemed to be straightforward typo type fixes. > I'll gladly address any defects in this patch, but I don't see an > compilation problems. > Ah yes, I see now... you fixed up the compile problem but did so incorrectly. int load_options_size = image->load_options_size / 2; /* ASCII */ This is a number of UTF-16 chars, the comment is completely wrong; - while (*s2 && *s2 != '\n' && options_size < load_options_size) { + while (*s2 && *s2 != '\n' && options_bytes < load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; You can't compare options_bytes against load_options_size; the latter being a UTF-16 shortword count. So the loop really needs to update options_chars in the loop to compare it against load_options_size: while (*s2 && *s2 != '\n' && options_chars < load_options_size) { options_bytes += efi_utf8_bytes(*s2++); option_chars++; } -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin wrote: > Sorry this version is broken and doesn't even compile due to remaining > options_size references. I compiled and tested this series on both x86_64 (using OVMF) and on the ARM simulator. I just doubled checked my kernel .config to verify this was not being omitted and I'm pretty sure this doesn't have any compilation problems. I did make a few changes to get the untested version you sent out to compile, but they all seemed to be straightforward typo type fixes. I'll gladly address any defects in this patch, but I don't see an compilation problems. Thanks, Roy > > Roy Franz wrote: >>From: "H. Peter Anvin" >> >>Improve the conversion of the UTF-16 EFI command line >>to UTF-8 for passing to the kernel. >> >>Signed-off-by: Roy Franz >>--- >> arch/x86/boot/compressed/eboot.c |3 +- >>drivers/firmware/efi/efi-stub-helper.c | 92 >> >> 2 files changed, 72 insertions(+), 23 deletions(-) >> >>diff --git a/arch/x86/boot/compressed/eboot.c >>b/arch/x86/boot/compressed/eboot.c >>index 5e708c0..4723dc89 100644 >>--- a/arch/x86/boot/compressed/eboot.c >>+++ b/arch/x86/boot/compressed/eboot.c >>@@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, >>efi_system_table_t *_table) >> hdr->type_of_loader = 0x21; >> >> /* Convert unicode cmdline to ascii */ >>- cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, >>- _size); >>+ cmdline_ptr = efi_convert_cmdline(sys_table, image, _size); >> if (!cmdline_ptr) >> goto fail; >> hdr->cmd_line_ptr = (unsigned long)cmdline_ptr; >>diff --git a/drivers/firmware/efi/efi-stub-helper.c >>b/drivers/firmware/efi/efi-stub-helper.c >>index 335d17d..8331892 100644 >>--- a/drivers/firmware/efi/efi-stub-helper.c >>+++ b/drivers/firmware/efi/efi-stub-helper.c >>@@ -548,61 +548,111 @@ static efi_status_t >>efi_relocate_kernel(efi_system_table_t *sys_table_arg, >> >> return status; >> } >>-/* Convert the unicode UEFI command line to ASCII to pass to kernel. >>+ >>+/* >>+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character. >>+ * This overestimates for surrogates, but that is okay. >>+ */ >>+static int efi_utf8_bytes(u16 c) >>+{ >>+ return 1 + (c >= 0x80) + (c >= 0x800); >>+} >>+ >>+/* >>+ * Convert an UTF-16 string, not necessarily null terminated, to >>UTF-8. >>+ */ >>+static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) >>+{ >>+ unsigned int c; >>+ >>+ while (n--) { >>+ c = *src++; >>+ if (n && c >= 0xd800 && c <= 0xdbff && >>+ *src >= 0xdc00 && *src <= 0xdfff) { >>+ c = 0x1 + ((c & 0x3ff) << 10) + (*src & 0x3ff); >>+ src++; >>+ n--; >>+ } >>+ if (c >= 0xd800 && c <= 0xdfff) >>+ c = 0xfffd; /* Unmatched surrogate */ >>+ if (c < 0x80) { >>+ *dst++ = c; >>+ continue; >>+ } >>+ if (c < 0x800) { >>+ *dst++ = 0xc0 + (c >> 6); >>+ goto t1; >>+ } >>+ if (c < 0x1) { >>+ *dst++ = 0xe0 + (c >> 12); >>+ goto t2; >>+ } >>+ *dst++ = 0xf0 + (c >> 18); >>+ *dst++ = 0x80 + ((c >> 12) & 0x3f); >>+t2: >>+ *dst++ = 0x80 + ((c >> 6) & 0x3f); >>+t1: >>+ *dst++ = 0x80 + (c & 0x3f); >>+ } >>+ >>+ return dst; >>+} >>+ >>+/* >>+ * Convert the unicode UEFI command line to ASCII to pass to kernel. >> * Size of memory allocated return in *cmd_line_len. >> * Returns NULL on error. >> */ >>-static char *efi_convert_cmdline_to_ascii(efi_system_table_t >>*sys_table_arg, >>-efi_loaded_image_t *image, >>-int *cmd_line_len) >>+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, >>+ efi_loaded_image_t *image, >>+ int *cmd_line_len) >> { >>- u16 *s2; >>+ const u16 *s2; >> u8 *s1 = NULL; >> unsigned long cmdline_addr = 0; >> int load_options_size = image->load_options_size / 2; /* ASCII */ >>- void *options = image->load_options; >>- int options_size = 0; >>+ const u16 *options = image->load_options; >>+ int options_bytes = 0; /* UTF-8 bytes */ >>+ int options_chars = 0; /* UTF-16 chars */ >> efi_status_t status; >>- int i; >> u16 zero = 0; >> >> if (options) { >> s2 = options; >>- while (*s2 && *s2 != '\n' && options_size < load_options_size) >>{ >>+ while (*s2 && *s2 != '\n' && options_bytes < >>load_options_size) { >>+ options_bytes +=
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
Sorry this version is broken and doesn't even compile due to remaining options_size references. Roy Franz wrote: >From: "H. Peter Anvin" > >Improve the conversion of the UTF-16 EFI command line >to UTF-8 for passing to the kernel. > >Signed-off-by: Roy Franz >--- > arch/x86/boot/compressed/eboot.c |3 +- >drivers/firmware/efi/efi-stub-helper.c | 92 > > 2 files changed, 72 insertions(+), 23 deletions(-) > >diff --git a/arch/x86/boot/compressed/eboot.c >b/arch/x86/boot/compressed/eboot.c >index 5e708c0..4723dc89 100644 >--- a/arch/x86/boot/compressed/eboot.c >+++ b/arch/x86/boot/compressed/eboot.c >@@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, >efi_system_table_t *_table) > hdr->type_of_loader = 0x21; > > /* Convert unicode cmdline to ascii */ >- cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, >- _size); >+ cmdline_ptr = efi_convert_cmdline(sys_table, image, _size); > if (!cmdline_ptr) > goto fail; > hdr->cmd_line_ptr = (unsigned long)cmdline_ptr; >diff --git a/drivers/firmware/efi/efi-stub-helper.c >b/drivers/firmware/efi/efi-stub-helper.c >index 335d17d..8331892 100644 >--- a/drivers/firmware/efi/efi-stub-helper.c >+++ b/drivers/firmware/efi/efi-stub-helper.c >@@ -548,61 +548,111 @@ static efi_status_t >efi_relocate_kernel(efi_system_table_t *sys_table_arg, > > return status; > } >-/* Convert the unicode UEFI command line to ASCII to pass to kernel. >+ >+/* >+ * Get the number of UTF-8 bytes corresponding to an UTF-16 character. >+ * This overestimates for surrogates, but that is okay. >+ */ >+static int efi_utf8_bytes(u16 c) >+{ >+ return 1 + (c >= 0x80) + (c >= 0x800); >+} >+ >+/* >+ * Convert an UTF-16 string, not necessarily null terminated, to >UTF-8. >+ */ >+static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) >+{ >+ unsigned int c; >+ >+ while (n--) { >+ c = *src++; >+ if (n && c >= 0xd800 && c <= 0xdbff && >+ *src >= 0xdc00 && *src <= 0xdfff) { >+ c = 0x1 + ((c & 0x3ff) << 10) + (*src & 0x3ff); >+ src++; >+ n--; >+ } >+ if (c >= 0xd800 && c <= 0xdfff) >+ c = 0xfffd; /* Unmatched surrogate */ >+ if (c < 0x80) { >+ *dst++ = c; >+ continue; >+ } >+ if (c < 0x800) { >+ *dst++ = 0xc0 + (c >> 6); >+ goto t1; >+ } >+ if (c < 0x1) { >+ *dst++ = 0xe0 + (c >> 12); >+ goto t2; >+ } >+ *dst++ = 0xf0 + (c >> 18); >+ *dst++ = 0x80 + ((c >> 12) & 0x3f); >+t2: >+ *dst++ = 0x80 + ((c >> 6) & 0x3f); >+t1: >+ *dst++ = 0x80 + (c & 0x3f); >+ } >+ >+ return dst; >+} >+ >+/* >+ * Convert the unicode UEFI command line to ASCII to pass to kernel. > * Size of memory allocated return in *cmd_line_len. > * Returns NULL on error. > */ >-static char *efi_convert_cmdline_to_ascii(efi_system_table_t >*sys_table_arg, >-efi_loaded_image_t *image, >-int *cmd_line_len) >+static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, >+ efi_loaded_image_t *image, >+ int *cmd_line_len) > { >- u16 *s2; >+ const u16 *s2; > u8 *s1 = NULL; > unsigned long cmdline_addr = 0; > int load_options_size = image->load_options_size / 2; /* ASCII */ >- void *options = image->load_options; >- int options_size = 0; >+ const u16 *options = image->load_options; >+ int options_bytes = 0; /* UTF-8 bytes */ >+ int options_chars = 0; /* UTF-16 chars */ > efi_status_t status; >- int i; > u16 zero = 0; > > if (options) { > s2 = options; >- while (*s2 && *s2 != '\n' && options_size < load_options_size) { >+ while (*s2 && *s2 != '\n' && options_bytes < load_options_size) >{ >+ options_bytes += efi_utf8_bytes(*s2); > s2++; >- options_size++; > } >+ options_chars = s2 - options; > } > >- if (options_size == 0) { >- /* No command line options, so return empty string*/ >- options_size = 1; >+ if (!options_chars) { >+ /* No command line options, so return empty string */ > options = > } > >- options_size++; /* NUL termination */ >+ options_bytes++;/* NUL termination */ >+ > #ifdef CONFIG_ARM > /* For ARM, allocate at a high address to avoid reserved >* regions at low addresses that we
[PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
From: "H. Peter Anvin" Improve the conversion of the UTF-16 EFI command line to UTF-8 for passing to the kernel. Signed-off-by: Roy Franz --- arch/x86/boot/compressed/eboot.c |3 +- drivers/firmware/efi/efi-stub-helper.c | 92 2 files changed, 72 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 5e708c0..4723dc89 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) hdr->type_of_loader = 0x21; /* Convert unicode cmdline to ascii */ - cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, - _size); + cmdline_ptr = efi_convert_cmdline(sys_table, image, _size); if (!cmdline_ptr) goto fail; hdr->cmd_line_ptr = (unsigned long)cmdline_ptr; diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c index 335d17d..8331892 100644 --- a/drivers/firmware/efi/efi-stub-helper.c +++ b/drivers/firmware/efi/efi-stub-helper.c @@ -548,61 +548,111 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg, return status; } -/* Convert the unicode UEFI command line to ASCII to pass to kernel. + +/* + * Get the number of UTF-8 bytes corresponding to an UTF-16 character. + * This overestimates for surrogates, but that is okay. + */ +static int efi_utf8_bytes(u16 c) +{ + return 1 + (c >= 0x80) + (c >= 0x800); +} + +/* + * Convert an UTF-16 string, not necessarily null terminated, to UTF-8. + */ +static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) +{ + unsigned int c; + + while (n--) { + c = *src++; + if (n && c >= 0xd800 && c <= 0xdbff && + *src >= 0xdc00 && *src <= 0xdfff) { + c = 0x1 + ((c & 0x3ff) << 10) + (*src & 0x3ff); + src++; + n--; + } + if (c >= 0xd800 && c <= 0xdfff) + c = 0xfffd; /* Unmatched surrogate */ + if (c < 0x80) { + *dst++ = c; + continue; + } + if (c < 0x800) { + *dst++ = 0xc0 + (c >> 6); + goto t1; + } + if (c < 0x1) { + *dst++ = 0xe0 + (c >> 12); + goto t2; + } + *dst++ = 0xf0 + (c >> 18); + *dst++ = 0x80 + ((c >> 12) & 0x3f); +t2: + *dst++ = 0x80 + ((c >> 6) & 0x3f); +t1: + *dst++ = 0x80 + (c & 0x3f); + } + + return dst; +} + +/* + * Convert the unicode UEFI command line to ASCII to pass to kernel. * Size of memory allocated return in *cmd_line_len. * Returns NULL on error. */ -static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg, - efi_loaded_image_t *image, - int *cmd_line_len) +static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, +efi_loaded_image_t *image, +int *cmd_line_len) { - u16 *s2; + const u16 *s2; u8 *s1 = NULL; unsigned long cmdline_addr = 0; int load_options_size = image->load_options_size / 2; /* ASCII */ - void *options = image->load_options; - int options_size = 0; + const u16 *options = image->load_options; + int options_bytes = 0; /* UTF-8 bytes */ + int options_chars = 0; /* UTF-16 chars */ efi_status_t status; - int i; u16 zero = 0; if (options) { s2 = options; - while (*s2 && *s2 != '\n' && options_size < load_options_size) { + while (*s2 && *s2 != '\n' && options_bytes < load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; } - if (options_size == 0) { - /* No command line options, so return empty string*/ - options_size = 1; + if (!options_chars) { + /* No command line options, so return empty string */ options = } - options_size++; /* NUL termination */ + options_bytes++;/* NUL termination */ + #ifdef CONFIG_ARM /* For ARM, allocate at a high address to avoid reserved * regions at low addresses that we don't know the specfics of * at the time we are processing the command line. */ - status = efi_high_alloc(sys_table_arg, options_size, 0, + status
[PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
From: H. Peter Anvin h...@zytor.com Improve the conversion of the UTF-16 EFI command line to UTF-8 for passing to the kernel. Signed-off-by: Roy Franz roy.fr...@linaro.org --- arch/x86/boot/compressed/eboot.c |3 +- drivers/firmware/efi/efi-stub-helper.c | 92 2 files changed, 72 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 5e708c0..4723dc89 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) hdr-type_of_loader = 0x21; /* Convert unicode cmdline to ascii */ - cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, - options_size); + cmdline_ptr = efi_convert_cmdline(sys_table, image, options_size); if (!cmdline_ptr) goto fail; hdr-cmd_line_ptr = (unsigned long)cmdline_ptr; diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c index 335d17d..8331892 100644 --- a/drivers/firmware/efi/efi-stub-helper.c +++ b/drivers/firmware/efi/efi-stub-helper.c @@ -548,61 +548,111 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg, return status; } -/* Convert the unicode UEFI command line to ASCII to pass to kernel. + +/* + * Get the number of UTF-8 bytes corresponding to an UTF-16 character. + * This overestimates for surrogates, but that is okay. + */ +static int efi_utf8_bytes(u16 c) +{ + return 1 + (c = 0x80) + (c = 0x800); +} + +/* + * Convert an UTF-16 string, not necessarily null terminated, to UTF-8. + */ +static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) +{ + unsigned int c; + + while (n--) { + c = *src++; + if (n c = 0xd800 c = 0xdbff + *src = 0xdc00 *src = 0xdfff) { + c = 0x1 + ((c 0x3ff) 10) + (*src 0x3ff); + src++; + n--; + } + if (c = 0xd800 c = 0xdfff) + c = 0xfffd; /* Unmatched surrogate */ + if (c 0x80) { + *dst++ = c; + continue; + } + if (c 0x800) { + *dst++ = 0xc0 + (c 6); + goto t1; + } + if (c 0x1) { + *dst++ = 0xe0 + (c 12); + goto t2; + } + *dst++ = 0xf0 + (c 18); + *dst++ = 0x80 + ((c 12) 0x3f); +t2: + *dst++ = 0x80 + ((c 6) 0x3f); +t1: + *dst++ = 0x80 + (c 0x3f); + } + + return dst; +} + +/* + * Convert the unicode UEFI command line to ASCII to pass to kernel. * Size of memory allocated return in *cmd_line_len. * Returns NULL on error. */ -static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg, - efi_loaded_image_t *image, - int *cmd_line_len) +static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, +efi_loaded_image_t *image, +int *cmd_line_len) { - u16 *s2; + const u16 *s2; u8 *s1 = NULL; unsigned long cmdline_addr = 0; int load_options_size = image-load_options_size / 2; /* ASCII */ - void *options = image-load_options; - int options_size = 0; + const u16 *options = image-load_options; + int options_bytes = 0; /* UTF-8 bytes */ + int options_chars = 0; /* UTF-16 chars */ efi_status_t status; - int i; u16 zero = 0; if (options) { s2 = options; - while (*s2 *s2 != '\n' options_size load_options_size) { + while (*s2 *s2 != '\n' options_bytes load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; } - if (options_size == 0) { - /* No command line options, so return empty string*/ - options_size = 1; + if (!options_chars) { + /* No command line options, so return empty string */ options = zero; } - options_size++; /* NUL termination */ + options_bytes++;/* NUL termination */ + #ifdef CONFIG_ARM /* For ARM, allocate at a high address to avoid reserved * regions at low addresses that we don't know the specfics of * at the time we are processing the command line. */ - status = efi_high_alloc(sys_table_arg, options_size, 0, + status =
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
Sorry this version is broken and doesn't even compile due to remaining options_size references. Roy Franz roy.fr...@linaro.org wrote: From: H. Peter Anvin h...@zytor.com Improve the conversion of the UTF-16 EFI command line to UTF-8 for passing to the kernel. Signed-off-by: Roy Franz roy.fr...@linaro.org --- arch/x86/boot/compressed/eboot.c |3 +- drivers/firmware/efi/efi-stub-helper.c | 92 2 files changed, 72 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 5e708c0..4723dc89 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) hdr-type_of_loader = 0x21; /* Convert unicode cmdline to ascii */ - cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, - options_size); + cmdline_ptr = efi_convert_cmdline(sys_table, image, options_size); if (!cmdline_ptr) goto fail; hdr-cmd_line_ptr = (unsigned long)cmdline_ptr; diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c index 335d17d..8331892 100644 --- a/drivers/firmware/efi/efi-stub-helper.c +++ b/drivers/firmware/efi/efi-stub-helper.c @@ -548,61 +548,111 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg, return status; } -/* Convert the unicode UEFI command line to ASCII to pass to kernel. + +/* + * Get the number of UTF-8 bytes corresponding to an UTF-16 character. + * This overestimates for surrogates, but that is okay. + */ +static int efi_utf8_bytes(u16 c) +{ + return 1 + (c = 0x80) + (c = 0x800); +} + +/* + * Convert an UTF-16 string, not necessarily null terminated, to UTF-8. + */ +static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) +{ + unsigned int c; + + while (n--) { + c = *src++; + if (n c = 0xd800 c = 0xdbff + *src = 0xdc00 *src = 0xdfff) { + c = 0x1 + ((c 0x3ff) 10) + (*src 0x3ff); + src++; + n--; + } + if (c = 0xd800 c = 0xdfff) + c = 0xfffd; /* Unmatched surrogate */ + if (c 0x80) { + *dst++ = c; + continue; + } + if (c 0x800) { + *dst++ = 0xc0 + (c 6); + goto t1; + } + if (c 0x1) { + *dst++ = 0xe0 + (c 12); + goto t2; + } + *dst++ = 0xf0 + (c 18); + *dst++ = 0x80 + ((c 12) 0x3f); +t2: + *dst++ = 0x80 + ((c 6) 0x3f); +t1: + *dst++ = 0x80 + (c 0x3f); + } + + return dst; +} + +/* + * Convert the unicode UEFI command line to ASCII to pass to kernel. * Size of memory allocated return in *cmd_line_len. * Returns NULL on error. */ -static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg, -efi_loaded_image_t *image, -int *cmd_line_len) +static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, + efi_loaded_image_t *image, + int *cmd_line_len) { - u16 *s2; + const u16 *s2; u8 *s1 = NULL; unsigned long cmdline_addr = 0; int load_options_size = image-load_options_size / 2; /* ASCII */ - void *options = image-load_options; - int options_size = 0; + const u16 *options = image-load_options; + int options_bytes = 0; /* UTF-8 bytes */ + int options_chars = 0; /* UTF-16 chars */ efi_status_t status; - int i; u16 zero = 0; if (options) { s2 = options; - while (*s2 *s2 != '\n' options_size load_options_size) { + while (*s2 *s2 != '\n' options_bytes load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; } - if (options_size == 0) { - /* No command line options, so return empty string*/ - options_size = 1; + if (!options_chars) { + /* No command line options, so return empty string */ options = zero; } - options_size++; /* NUL termination */ + options_bytes++;/* NUL termination */ + #ifdef CONFIG_ARM /* For ARM, allocate at a high address to avoid reserved * regions at low addresses that we don't know the specfics of * at the time we are processing the command line. */ - status =
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin h...@zytor.com wrote: Sorry this version is broken and doesn't even compile due to remaining options_size references. I compiled and tested this series on both x86_64 (using OVMF) and on the ARM simulator. I just doubled checked my kernel .config to verify this was not being omitted and I'm pretty sure this doesn't have any compilation problems. I did make a few changes to get the untested version you sent out to compile, but they all seemed to be straightforward typo type fixes. I'll gladly address any defects in this patch, but I don't see an compilation problems. Thanks, Roy Roy Franz roy.fr...@linaro.org wrote: From: H. Peter Anvin h...@zytor.com Improve the conversion of the UTF-16 EFI command line to UTF-8 for passing to the kernel. Signed-off-by: Roy Franz roy.fr...@linaro.org --- arch/x86/boot/compressed/eboot.c |3 +- drivers/firmware/efi/efi-stub-helper.c | 92 2 files changed, 72 insertions(+), 23 deletions(-) diff --git a/arch/x86/boot/compressed/eboot.c b/arch/x86/boot/compressed/eboot.c index 5e708c0..4723dc89 100644 --- a/arch/x86/boot/compressed/eboot.c +++ b/arch/x86/boot/compressed/eboot.c @@ -486,8 +486,7 @@ struct boot_params *make_boot_params(void *handle, efi_system_table_t *_table) hdr-type_of_loader = 0x21; /* Convert unicode cmdline to ascii */ - cmdline_ptr = efi_convert_cmdline_to_ascii(sys_table, image, - options_size); + cmdline_ptr = efi_convert_cmdline(sys_table, image, options_size); if (!cmdline_ptr) goto fail; hdr-cmd_line_ptr = (unsigned long)cmdline_ptr; diff --git a/drivers/firmware/efi/efi-stub-helper.c b/drivers/firmware/efi/efi-stub-helper.c index 335d17d..8331892 100644 --- a/drivers/firmware/efi/efi-stub-helper.c +++ b/drivers/firmware/efi/efi-stub-helper.c @@ -548,61 +548,111 @@ static efi_status_t efi_relocate_kernel(efi_system_table_t *sys_table_arg, return status; } -/* Convert the unicode UEFI command line to ASCII to pass to kernel. + +/* + * Get the number of UTF-8 bytes corresponding to an UTF-16 character. + * This overestimates for surrogates, but that is okay. + */ +static int efi_utf8_bytes(u16 c) +{ + return 1 + (c = 0x80) + (c = 0x800); +} + +/* + * Convert an UTF-16 string, not necessarily null terminated, to UTF-8. + */ +static u8 *efi_utf16_to_utf8(u8 *dst, const u16 *src, int n) +{ + unsigned int c; + + while (n--) { + c = *src++; + if (n c = 0xd800 c = 0xdbff + *src = 0xdc00 *src = 0xdfff) { + c = 0x1 + ((c 0x3ff) 10) + (*src 0x3ff); + src++; + n--; + } + if (c = 0xd800 c = 0xdfff) + c = 0xfffd; /* Unmatched surrogate */ + if (c 0x80) { + *dst++ = c; + continue; + } + if (c 0x800) { + *dst++ = 0xc0 + (c 6); + goto t1; + } + if (c 0x1) { + *dst++ = 0xe0 + (c 12); + goto t2; + } + *dst++ = 0xf0 + (c 18); + *dst++ = 0x80 + ((c 12) 0x3f); +t2: + *dst++ = 0x80 + ((c 6) 0x3f); +t1: + *dst++ = 0x80 + (c 0x3f); + } + + return dst; +} + +/* + * Convert the unicode UEFI command line to ASCII to pass to kernel. * Size of memory allocated return in *cmd_line_len. * Returns NULL on error. */ -static char *efi_convert_cmdline_to_ascii(efi_system_table_t *sys_table_arg, -efi_loaded_image_t *image, -int *cmd_line_len) +static char *efi_convert_cmdline(efi_system_table_t *sys_table_arg, + efi_loaded_image_t *image, + int *cmd_line_len) { - u16 *s2; + const u16 *s2; u8 *s1 = NULL; unsigned long cmdline_addr = 0; int load_options_size = image-load_options_size / 2; /* ASCII */ - void *options = image-load_options; - int options_size = 0; + const u16 *options = image-load_options; + int options_bytes = 0; /* UTF-8 bytes */ + int options_chars = 0; /* UTF-16 chars */ efi_status_t status; - int i; u16 zero = 0; if (options) { s2 = options; - while (*s2 *s2 != '\n' options_size load_options_size) { + while (*s2 *s2 != '\n' options_bytes load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; } - if (options_size == 0) { - /* No command line options, so
Re: [PATCH 10/18] Do proper conversion from UTF-16 to UTF-8
On 09/22/2013 04:07 PM, Roy Franz wrote: On Sun, Sep 22, 2013 at 3:54 PM, H. Peter Anvin h...@zytor.com wrote: Sorry this version is broken and doesn't even compile due to remaining options_size references. I compiled and tested this series on both x86_64 (using OVMF) and on the ARM simulator. I just doubled checked my kernel .config to verify this was not being omitted and I'm pretty sure this doesn't have any compilation problems. I did make a few changes to get the untested version you sent out to compile, but they all seemed to be straightforward typo type fixes. I'll gladly address any defects in this patch, but I don't see an compilation problems. Ah yes, I see now... you fixed up the compile problem but did so incorrectly. int load_options_size = image-load_options_size / 2; /* ASCII */ This is a number of UTF-16 chars, the comment is completely wrong; - while (*s2 *s2 != '\n' options_size load_options_size) { + while (*s2 *s2 != '\n' options_bytes load_options_size) { + options_bytes += efi_utf8_bytes(*s2); s2++; - options_size++; } + options_chars = s2 - options; You can't compare options_bytes against load_options_size; the latter being a UTF-16 shortword count. So the loop really needs to update options_chars in the loop to compare it against load_options_size: while (*s2 *s2 != '\n' options_chars load_options_size) { options_bytes += efi_utf8_bytes(*s2++); option_chars++; } -hpa -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/