Re: [PATCH v2 4/4] fs: unicode: Add utf8 module and a unicode layer

2021-03-23 Thread Shreeya Patel



On 19/03/21 2:35 am, Eric Biggers wrote:

On Thu, Mar 18, 2021 at 07:03:05PM +0530, Shreeya Patel wrote:

+struct unicode_ops {
+   struct module *owner;
+   int (*validate)(const struct unicode_map *um, const struct qstr *str);
+   int (*strncmp)(const struct unicode_map *um, const struct qstr *s1,
+  const struct qstr *s2);
+   int (*strncasecmp)(const struct unicode_map *um, const struct qstr *s1,
+  const struct qstr *s2);
+   int (*strncasecmp_folded)(const struct unicode_map *um, const struct 
qstr *cf,
+ const struct qstr *s1);
+   int (*normalize)(const struct unicode_map *um, const struct qstr *str,
+unsigned char *dest, size_t dlen);
+   int (*casefold)(const struct unicode_map *um, const struct qstr *str,
+   unsigned char *dest, size_t dlen);
+   int (*casefold_hash)(const struct unicode_map *um, const void *salt,
+struct qstr *str);
+   struct unicode_map* (*load)(const char *version);
+};

Indirect calls are expensive these days, especially due to the Spectre
mitigations.  Would it make sense to use static calls
(include/linux/static_call.h) instead for this?


Hi Eric,

I just sent a v3 with static_call implementation.




- Eric



Re: [PATCH v2 4/4] fs: unicode: Add utf8 module and a unicode layer

2021-03-19 Thread Gabriel Krisman Bertazi
Shreeya Patel  writes:

> On 19/03/21 1:27 am, Gabriel Krisman Bertazi wrote:

>> Maybe, the if leg should be:
>>
>> if (!utf8_ops || !try_module_get(utf8_ops->owner)
>> return ERR_PTR(-ENODEV)
>>
>> But this is still racy, since you are not protecting utf8_ops before
>> acquiring the reference.  If you race with module removal here, a
>> NULL ptr dereference can still occur.  See below.
>
>
> If module is removed before reaching this step, then unicode_unregister
> function would make utf8_ops NULL. So the first condition of if will be true
> and it will return error so how can we have a NULL ptr dereference
> then?

Hi Shreeya,

As we discussed offline, it can happen if the module is deregistered
after checking utf8_ops and before doing the try_module_get.

>>>   }
>>> -EXPORT_SYMBOL(unicode_normalize);
>>> +EXPORT_SYMBOL(unicode_load);
>>>   -static int unicode_parse_version(const char *version, unsigned int
>>> *maj,
>>> -unsigned int *min, unsigned int *rev)
>>> +void unicode_unload(struct unicode_map *um)
>>>   {
>>> -   substring_t args[3];
>>> -   char version_string[12];
>>> -   static const struct match_token token[] = {
>>> -   {1, "%d.%d.%d"},
>>> -   {0, NULL}
>>> -   };
>>> -
>>> -   strscpy(version_string, version, sizeof(version_string));
>>> -
>>> -   if (match_token(version_string, token, args) != 1)
>>> -   return -EINVAL;
>>> +   if (utf8_ops)
>>> +   module_put(utf8_ops->owner);
>>>
>> How can we have a unicode_map to free if utf8_ops is NULL?  that seems
>> to be an invalid use of API, which suggests a bug elsewhere
>> in the kernel.  maybe this should read like this:
>>
>> void unicode_unload(struct unicode_map *um)
>> {
>>if (WARN_ON(!utf8_ops))
>>  return;
>>
>>module_put(utf8_ops->owner);
>>kfree(um);
>> }
>
>
> The reason for adding the check if(utf8_ops) is that some of the filesystem
> calls the unicode_unload function even before calling the unicode_load
> function.
> if we try to decrement the reference without even having the
> reference. ( i.e. not loading the module )
> it would result in kernel panic.
> fs/ext4/super.c
> fs/f2fs/super.c
> Both the above files call the unicode_unload function if CONFIG_UNICODE
> is enabled.
> Not sure if this is an odd behavior or expected.

Those seem to be error paths, where the mount fails before we get a
chance to load the unicode map.  I suggest we fix the callers to avoid
calling the unicode API unnecessarily.

-- 
Gabriel Krisman Bertazi


Re: [PATCH v2 4/4] fs: unicode: Add utf8 module and a unicode layer

2021-03-19 Thread Shreeya Patel



On 19/03/21 1:27 am, Gabriel Krisman Bertazi wrote:

Shreeya,


Hi Gabriel,

Thanks for your detailed review. Please find my comments inline.

utf8data.h_shipped has a large database table which is an auto-generated
decodification trie for the unicode normalization functions.
It is not necessary to carry this large table in the kernel hence make


Thanks for the v2.  This is looking much better!  I have a few comments
inline.

I also have a question about how this patchset was tested.  Did you run
the normalization test (UNICODE_NORMALIZATION_SELFTEST)?  What about
actual filesystems?



Yes I did run the normalization test by loading the utf8-selftest.ko module.
modprobe utf8-selftest


For actual filesystem check I followed the blog post written by you.
https://www.collabora.com/news-and-blog/blog/2020/08/27/using-the-linux-kernel-case-insensitive-feature-in-ext4/
So the basic steps that were followed are as follows :-
1. mkfs -t ext4 -O casefold image.img
2. mount image.img /mnt ( This step would load the module )
3. cd /mnt
4. mkdir CI_dir
5. touch CI_dir/hello_world ( this step would call the inline functions )
6. ls CI_dir/HELLO_WORLD (HELLO_WORLD file should be found if our 
case-insensitive fs is working as expected.)





Minor nit:

"It is not necessary to load this large table in the kernel in the
kernel if no file system is using it"


UTF-8 encoding loadable by converting it into a module.
Also, modify the file called unicode-core which will act as a layer for
unicode subsystem. It will load the UTF-8 module and access it's functions
whenever any filesystem that needs unicode is mounted.

Signed-off-by: Shreeya Patel 
---

Changes in v2
   - Remove the duplicate file utf8-core.c
   - Make the wrapper functions inline.
   - Remove msleep and use try_module_get() and module_put()
 for ensuring that module is loaded correctly and also
 doesn't get unloaded while in use.
  fs/unicode/Kconfig|  11 +-
  fs/unicode/Makefile   |   5 +-
  fs/unicode/unicode-core.c | 229 +--
  fs/unicode/utf8mod.c  | 246 ++
  include/linux/unicode.h   |  73 ---
  5 files changed, 346 insertions(+), 218 deletions(-)
  create mode 100644 fs/unicode/utf8mod.c

diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
index 2c27b9a5cd6c..2961b0206b4d 100644
--- a/fs/unicode/Kconfig
+++ b/fs/unicode/Kconfig
@@ -8,7 +8,16 @@ config UNICODE
  Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
  support.
  
+# UTF-8 encoding can be compiled as a module using UNICODE_UTF8 option.

+# Having UTF-8 encoding as a module will avoid carrying large
+# database table present in utf8data.h_shipped into the kernel
+# by being able to load it only when it is required by the filesystem.
+config UNICODE_UTF8
+   tristate "UTF-8 module"
+   depends on UNICODE
+   default m
+
  config UNICODE_NORMALIZATION_SELFTEST
tristate "Test UTF-8 normalization support"
-   depends on UNICODE
+   depends on UNICODE_UTF8
default n
diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
index fbf9a629ed0d..9dbb04194b32 100644
--- a/fs/unicode/Makefile
+++ b/fs/unicode/Makefile
@@ -1,11 +1,14 @@
  # SPDX-License-Identifier: GPL-2.0
  
  obj-$(CONFIG_UNICODE) += unicode.o

+obj-$(CONFIG_UNICODE_UTF8) += utf8.o
  obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
  
-unicode-y := utf8-norm.o unicode-core.o

+unicode-y := unicode-core.o
+utf8-y := utf8mod.o utf8-norm.o
  
  $(obj)/utf8-norm.o: $(obj)/utf8data.h

+$(obj)/utf8mod.o: $(obj)/utf8-norm.o
  
  # In the normal build, the checked-in utf8data.h is just shipped.

  #
diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
index 287a8a48836c..945984a3fe9e 100644
--- a/fs/unicode/unicode-core.c
+++ b/fs/unicode/unicode-core.c
@@ -1,235 +1,60 @@
  /* SPDX-License-Identifier: GPL-2.0 */
  #include 
  #include 
-#include 
  #include 
-#include 
  #include 
  #include 
-#include 
  
-#include "utf8n.h"
  
-int unicode_validate(const struct unicode_map *um, const struct qstr *str)

-{
-   const struct utf8data *data = utf8nfdi(um->version);
-
-   if (utf8nlen(data, str->name, str->len) < 0)
-   return -1;
-   return 0;
-}
-EXPORT_SYMBOL(unicode_validate);
-
-int unicode_strncmp(const struct unicode_map *um,
-   const struct qstr *s1, const struct qstr *s2)
-{
-   const struct utf8data *data = utf8nfdi(um->version);
-   struct utf8cursor cur1, cur2;
-   int c1, c2;
-
-   if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
-   return -EINVAL;
-
-   if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
-   return -EINVAL;
-
-   do {
-   c1 = utf8byte(&cur1);
-   c2 = utf8byte(&cur2);
-
-   if (c1 < 0 || c2 < 0)
-   return -EINVAL;
-   if (c1 != c2)
-   r

Re: [PATCH v2 4/4] fs: unicode: Add utf8 module and a unicode layer

2021-03-18 Thread Eric Biggers
On Thu, Mar 18, 2021 at 07:03:05PM +0530, Shreeya Patel wrote:
> +struct unicode_ops {
> + struct module *owner;
> + int (*validate)(const struct unicode_map *um, const struct qstr *str);
> + int (*strncmp)(const struct unicode_map *um, const struct qstr *s1,
> +const struct qstr *s2);
> + int (*strncasecmp)(const struct unicode_map *um, const struct qstr *s1,
> +const struct qstr *s2);
> + int (*strncasecmp_folded)(const struct unicode_map *um, const struct 
> qstr *cf,
> +   const struct qstr *s1);
> + int (*normalize)(const struct unicode_map *um, const struct qstr *str,
> +  unsigned char *dest, size_t dlen);
> + int (*casefold)(const struct unicode_map *um, const struct qstr *str,
> + unsigned char *dest, size_t dlen);
> + int (*casefold_hash)(const struct unicode_map *um, const void *salt,
> +  struct qstr *str);
> + struct unicode_map* (*load)(const char *version);
> +};

Indirect calls are expensive these days, especially due to the Spectre
mitigations.  Would it make sense to use static calls
(include/linux/static_call.h) instead for this?

- Eric


Re: [PATCH v2 4/4] fs: unicode: Add utf8 module and a unicode layer

2021-03-18 Thread Gabriel Krisman Bertazi


Shreeya,


> utf8data.h_shipped has a large database table which is an auto-generated
> decodification trie for the unicode normalization functions.
> It is not necessary to carry this large table in the kernel hence make
>

Thanks for the v2.  This is looking much better!  I have a few comments
inline.

I also have a question about how this patchset was tested.  Did you run
the normalization test (UNICODE_NORMALIZATION_SELFTEST)?  What about
actual filesystems?

Minor nit:

"It is not necessary to load this large table in the kernel in the
kernel if no file system is using it"

> UTF-8 encoding loadable by converting it into a module.
> Also, modify the file called unicode-core which will act as a layer for
> unicode subsystem. It will load the UTF-8 module and access it's functions
> whenever any filesystem that needs unicode is mounted.
>
> Signed-off-by: Shreeya Patel 
> ---
>
> Changes in v2
>   - Remove the duplicate file utf8-core.c
>   - Make the wrapper functions inline.
>   - Remove msleep and use try_module_get() and module_put()
> for ensuring that module is loaded correctly and also
> doesn't get unloaded while in use.

>  fs/unicode/Kconfig|  11 +-
>  fs/unicode/Makefile   |   5 +-
>  fs/unicode/unicode-core.c | 229 +--
>  fs/unicode/utf8mod.c  | 246 ++
>  include/linux/unicode.h   |  73 ---
>  5 files changed, 346 insertions(+), 218 deletions(-)
>  create mode 100644 fs/unicode/utf8mod.c
>
> diff --git a/fs/unicode/Kconfig b/fs/unicode/Kconfig
> index 2c27b9a5cd6c..2961b0206b4d 100644
> --- a/fs/unicode/Kconfig
> +++ b/fs/unicode/Kconfig
> @@ -8,7 +8,16 @@ config UNICODE
> Say Y here to enable UTF-8 NFD normalization and NFD+CF casefolding
> support.
>  
> +# UTF-8 encoding can be compiled as a module using UNICODE_UTF8 option.
> +# Having UTF-8 encoding as a module will avoid carrying large
> +# database table present in utf8data.h_shipped into the kernel
> +# by being able to load it only when it is required by the filesystem.
> +config UNICODE_UTF8
> + tristate "UTF-8 module"
> + depends on UNICODE
> + default m
> +
>  config UNICODE_NORMALIZATION_SELFTEST
>   tristate "Test UTF-8 normalization support"
> - depends on UNICODE
> + depends on UNICODE_UTF8
>   default n
> diff --git a/fs/unicode/Makefile b/fs/unicode/Makefile
> index fbf9a629ed0d..9dbb04194b32 100644
> --- a/fs/unicode/Makefile
> +++ b/fs/unicode/Makefile
> @@ -1,11 +1,14 @@
>  # SPDX-License-Identifier: GPL-2.0
>  
>  obj-$(CONFIG_UNICODE) += unicode.o
> +obj-$(CONFIG_UNICODE_UTF8) += utf8.o
>  obj-$(CONFIG_UNICODE_NORMALIZATION_SELFTEST) += utf8-selftest.o
>  
> -unicode-y := utf8-norm.o unicode-core.o
> +unicode-y := unicode-core.o
> +utf8-y := utf8mod.o utf8-norm.o
>  
>  $(obj)/utf8-norm.o: $(obj)/utf8data.h
> +$(obj)/utf8mod.o: $(obj)/utf8-norm.o
>  
>  # In the normal build, the checked-in utf8data.h is just shipped.
>  #
> diff --git a/fs/unicode/unicode-core.c b/fs/unicode/unicode-core.c
> index 287a8a48836c..945984a3fe9e 100644
> --- a/fs/unicode/unicode-core.c
> +++ b/fs/unicode/unicode-core.c
> @@ -1,235 +1,60 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>  #include 
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
>  #include 
> -#include 
>  
> -#include "utf8n.h"
>  
> -int unicode_validate(const struct unicode_map *um, const struct qstr *str)
> -{
> - const struct utf8data *data = utf8nfdi(um->version);
> -
> - if (utf8nlen(data, str->name, str->len) < 0)
> - return -1;
> - return 0;
> -}
> -EXPORT_SYMBOL(unicode_validate);
> -
> -int unicode_strncmp(const struct unicode_map *um,
> - const struct qstr *s1, const struct qstr *s2)
> -{
> - const struct utf8data *data = utf8nfdi(um->version);
> - struct utf8cursor cur1, cur2;
> - int c1, c2;
> -
> - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> - return -EINVAL;
> -
> - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> - return -EINVAL;
> -
> - do {
> - c1 = utf8byte(&cur1);
> - c2 = utf8byte(&cur2);
> -
> - if (c1 < 0 || c2 < 0)
> - return -EINVAL;
> - if (c1 != c2)
> - return 1;
> - } while (c1);
> -
> - return 0;
> -}
> -EXPORT_SYMBOL(unicode_strncmp);
> -
> -int unicode_strncasecmp(const struct unicode_map *um,
> - const struct qstr *s1, const struct qstr *s2)
> -{
> - const struct utf8data *data = utf8nfdicf(um->version);
> - struct utf8cursor cur1, cur2;
> - int c1, c2;
> +struct unicode_ops *utf8_ops;
> +EXPORT_SYMBOL(utf8_ops);
>  
> - if (utf8ncursor(&cur1, data, s1->name, s1->len) < 0)
> - return -EINVAL;
> -
> - if (utf8ncursor(&cur2, data, s2->name, s2->len) < 0)
> - return -EINVAL;
> -
> - do {
> - c1 = utf8byte(&cur1