Re: [PATCH] net/core/flow.c: compare data with memcmp

2007-01-01 Thread Daniel Marjamäki

Hello!

I have done a little testing on my own. My results is that memcpy is
many times faster even with aligned data.

I am testing in an ordinary console program. I am including the code below.
If I'm doing something wrong, please tell me so.

As you can see I am not using the same datadeclarations as the kernel
but I'm testing the algorithm here not the data. By testing various
data and types of data I can make sure the algorithm behaves correctly
in all situations.
The datamember 'd' in flowi is not part of the comparison, but by
changing it into an 'unsigned int' it becomes part of the comparison.



const int NUM_REP = 0x7FFF;

typedef unsigned int flow_compare_t;

struct flowi {
unsigned int a,b,c;
unsigned char d;
};


/* I hear what you're saying, use memcmp.  But memcmp cannot make
* important assumptions that we can here, such as alignment and
* constant size.
*/
static int flow_key_compare(struct flowi *key1, struct flowi *key2)
{
flow_compare_t *k1, *k1_lim, *k2;
const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

k1 = (flow_compare_t *) key1;
k1_lim = k1 + n_elem;

k2 = (flow_compare_t *) key2;

do {
if (*k1++ != *k2++)
return 1;
} while (k1 < k1_lim);

return 0;
}



static int flow_key_compare2(struct flowi *key1, struct flowi *key2)
{
return memcmp(key1, key2, (sizeof(struct flowi) /
sizeof(flow_compare_t)) * sizeof(flow_compare_t));
}






int main()
{

struct flowi key1 = {0,1,2,3};
struct flowi key2 = {0,1,2,0};
char str[300];
int i;

/* put data in aligned addresses */
struct flowi *k1 = (struct flowi *)((int)([100]) & 0xFFF0);
struct flowi *k2 = (struct flowi *)((int)([200]) & 0xFFF0);
memcpy(k1, , sizeof(struct flowi));
memcpy(k2, , sizeof(struct flowi));

/* Compare data */
printf("compare1..\n");
for (i = 0; i < NUM_REP; i++)
flow_key_compare(k1, k2);

printf("compare2..\n");
for (i = 0; i < NUM_REP; i++)
flow_key_compare2(k1, k2);


printf((flow_key_compare(k1,k2)==(flow_key_compare2(k1,k2)?1:0))?"ok\n":"error\n");

return 0;
}











2007/1/1, Daniel Marjamäki <[EMAIL PROTECTED]>:

Hello!

So you mean that in this particular case it's faster with a handcoded
comparison than memcmp? Because both key1 and key2 are located at
word-aligned addresses?
That's fascinating.

Best regards,
Daniel

2006/12/31, David Miller <[EMAIL PROTECTED]>:
> From: "Daniel_Marjamäki" <[EMAIL PROTECTED]>
> Date: Sun, 31 Dec 2006 17:37:05 +0100
>
> > From: Daniel Marjamäki
> > This has been tested by me.
> > Signed-off-by: Daniel Marjamäki <[EMAIL PROTECTED]>
>
> Please do not do this.
>
> memcmp() cannot assume the alignment of the source and
> destination buffers and thus will run more slowly than
> that open-coded comparison.
>
> That code was done like that on purpose because it is
> one of the most critical paths in the networking flow
> cache lookup which runs for every IPSEC packet going
> throught the system.
>


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/core/flow.c: compare data with memcmp

2007-01-01 Thread Daniel Marjamäki

Hello!

I have done a little testing on my own. My results is that memcpy is
many times faster even with aligned data.

I am testing in an ordinary console program. I am including the code below.
If I'm doing something wrong, please tell me so.

As you can see I am not using the same datadeclarations as the kernel
but I'm testing the algorithm here not the data. By testing various
data and types of data I can make sure the algorithm behaves correctly
in all situations.
The datamember 'd' in flowi is not part of the comparison, but by
changing it into an 'unsigned int' it becomes part of the comparison.



const int NUM_REP = 0x7FFF;

typedef unsigned int flow_compare_t;

struct flowi {
unsigned int a,b,c;
unsigned char d;
};


/* I hear what you're saying, use memcmp.  But memcmp cannot make
* important assumptions that we can here, such as alignment and
* constant size.
*/
static int flow_key_compare(struct flowi *key1, struct flowi *key2)
{
flow_compare_t *k1, *k1_lim, *k2;
const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

k1 = (flow_compare_t *) key1;
k1_lim = k1 + n_elem;

k2 = (flow_compare_t *) key2;

do {
if (*k1++ != *k2++)
return 1;
} while (k1  k1_lim);

return 0;
}



static int flow_key_compare2(struct flowi *key1, struct flowi *key2)
{
return memcmp(key1, key2, (sizeof(struct flowi) /
sizeof(flow_compare_t)) * sizeof(flow_compare_t));
}






int main()
{

struct flowi key1 = {0,1,2,3};
struct flowi key2 = {0,1,2,0};
char str[300];
int i;

/* put data in aligned addresses */
struct flowi *k1 = (struct flowi *)((int)(str[100])  0xFFF0);
struct flowi *k2 = (struct flowi *)((int)(str[200])  0xFFF0);
memcpy(k1, key1, sizeof(struct flowi));
memcpy(k2, key2, sizeof(struct flowi));

/* Compare data */
printf(compare1..\n);
for (i = 0; i  NUM_REP; i++)
flow_key_compare(k1, k2);

printf(compare2..\n);
for (i = 0; i  NUM_REP; i++)
flow_key_compare2(k1, k2);


printf((flow_key_compare(k1,k2)==(flow_key_compare2(k1,k2)?1:0))?ok\n:error\n);

return 0;
}











2007/1/1, Daniel Marjamäki [EMAIL PROTECTED]:

Hello!

So you mean that in this particular case it's faster with a handcoded
comparison than memcmp? Because both key1 and key2 are located at
word-aligned addresses?
That's fascinating.

Best regards,
Daniel

2006/12/31, David Miller [EMAIL PROTECTED]:
 From: Daniel_Marjamäki [EMAIL PROTECTED]
 Date: Sun, 31 Dec 2006 17:37:05 +0100

  From: Daniel Marjamäki
  This has been tested by me.
  Signed-off-by: Daniel Marjamäki [EMAIL PROTECTED]

 Please do not do this.

 memcmp() cannot assume the alignment of the source and
 destination buffers and thus will run more slowly than
 that open-coded comparison.

 That code was done like that on purpose because it is
 one of the most critical paths in the networking flow
 cache lookup which runs for every IPSEC packet going
 throught the system.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/core/flow.c: compare data with memcmp

2006-12-31 Thread Daniel Marjamäki

Hello!

So you mean that in this particular case it's faster with a handcoded
comparison than memcmp? Because both key1 and key2 are located at
word-aligned addresses?
That's fascinating.

Best regards,
Daniel

2006/12/31, David Miller <[EMAIL PROTECTED]>:

From: "Daniel_Marjamäki" <[EMAIL PROTECTED]>
Date: Sun, 31 Dec 2006 17:37:05 +0100

> From: Daniel Marjamäki
> This has been tested by me.
> Signed-off-by: Daniel Marjamäki <[EMAIL PROTECTED]>

Please do not do this.

memcmp() cannot assume the alignment of the source and
destination buffers and thus will run more slowly than
that open-coded comparison.

That code was done like that on purpose because it is
one of the most critical paths in the networking flow
cache lookup which runs for every IPSEC packet going
throught the system.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net/core/flow.c: compare data with memcmp

2006-12-31 Thread Daniel Marjamäki

From: Daniel Marjamäki
This has been tested by me.
Signed-off-by: Daniel Marjamäki <[EMAIL PROTECTED]>
--- linux-2.6.20-rc2/net/core/flow.c2006-12-27 09:59:56.0 +0100
+++ linux/net/core/flow.c   2006-12-31 18:26:06.0 +0100
@@ -144,29 +144,16 @@ typedef u32 flow_compare_t;

extern void flowi_is_missized(void);

-/* I hear what you're saying, use memcmp.  But memcmp cannot make
- * important assumptions that we can here, such as alignment and
- * constant size.
- */
static int flow_key_compare(struct flowi *key1, struct flowi *key2)
{
-   flow_compare_t *k1, *k1_lim, *k2;
-   const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
if (sizeof(struct flowi) % sizeof(flow_compare_t))
flowi_is_missized();

-   k1 = (flow_compare_t *) key1;
-   k1_lim = k1 + n_elem;
-
-   k2 = (flow_compare_t *) key2;
-
-   do {
-   if (*k1++ != *k2++)
-   return 1;
-   } while (k1 < k1_lim);
+   /* Number of elements to compare */
+   const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

-   return 0;
+   /* Compare all elements in key1 and key2. */
+   return memcmp(key1, key2, n_elem * sizeof(flow_compare_t));
}

void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] net/core/flow.c: compare data with memcmp

2006-12-31 Thread Daniel Marjamäki

From: Daniel Marjamäki
This has been tested by me.
Signed-off-by: Daniel Marjamäki [EMAIL PROTECTED]
--- linux-2.6.20-rc2/net/core/flow.c2006-12-27 09:59:56.0 +0100
+++ linux/net/core/flow.c   2006-12-31 18:26:06.0 +0100
@@ -144,29 +144,16 @@ typedef u32 flow_compare_t;

extern void flowi_is_missized(void);

-/* I hear what you're saying, use memcmp.  But memcmp cannot make
- * important assumptions that we can here, such as alignment and
- * constant size.
- */
static int flow_key_compare(struct flowi *key1, struct flowi *key2)
{
-   flow_compare_t *k1, *k1_lim, *k2;
-   const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);
-
if (sizeof(struct flowi) % sizeof(flow_compare_t))
flowi_is_missized();

-   k1 = (flow_compare_t *) key1;
-   k1_lim = k1 + n_elem;
-
-   k2 = (flow_compare_t *) key2;
-
-   do {
-   if (*k1++ != *k2++)
-   return 1;
-   } while (k1  k1_lim);
+   /* Number of elements to compare */
+   const int n_elem = sizeof(struct flowi) / sizeof(flow_compare_t);

-   return 0;
+   /* Compare all elements in key1 and key2. */
+   return memcmp(key1, key2, n_elem * sizeof(flow_compare_t));
}

void *flow_cache_lookup(struct flowi *key, u16 family, u8 dir,
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] net/core/flow.c: compare data with memcmp

2006-12-31 Thread Daniel Marjamäki

Hello!

So you mean that in this particular case it's faster with a handcoded
comparison than memcmp? Because both key1 and key2 are located at
word-aligned addresses?
That's fascinating.

Best regards,
Daniel

2006/12/31, David Miller [EMAIL PROTECTED]:

From: Daniel_Marjamäki [EMAIL PROTECTED]
Date: Sun, 31 Dec 2006 17:37:05 +0100

 From: Daniel Marjamäki
 This has been tested by me.
 Signed-off-by: Daniel Marjamäki [EMAIL PROTECTED]

Please do not do this.

memcmp() cannot assume the alignment of the source and
destination buffers and thus will run more slowly than
that open-coded comparison.

That code was done like that on purpose because it is
one of the most critical paths in the networking flow
cache lookup which runs for every IPSEC packet going
throught the system.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Want comments regarding patch

2006-12-28 Thread Daniel Marjamäki

Hello!

Thank you for your comments. It seems to me the issue was the readability.

It was my goal to improve the readability. I failed.

I personally prefer to use standard functions instead of writing code.
In my opinion using standard functions means less code that is easier to read.

Best regards,
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Want comments regarding patch

2006-12-28 Thread Daniel Marjamäki

Hello all!

I sent a patch with this content:

-   for (i = 0; i < MAX_PIRQS; i++)
-   pirq_entries[i] = -1;
+   memset(pirq_entries, -1, sizeof(pirq_entries));

I'd like to know if you have any comments to this change. It was of
course my intention to make the code shorter, simpler and faster.

I've discussed this with Ingo Molnar and here's our conversation:

INGO:

hm, i'm not sure i like this - the '-1' in the memset is for a byte,
while the pirq_entries are word sized. It should work because the bytes
happen to be 0xff for the word too, but this is encodes an assumption,
and were we ever to change that value it could break silently. gcc ought
to be able to figure out the best way to initialize the array.


DANIEL:

Thank you for the comments.

I understand your point, it's good. But I personally still like my
method better.
For me -1 is just as valid as an argument as 0. As you note however,
it assumes that the next developer understands the encoding of
negative numbers. A developer who doesn't know the encoding could be
very confused. Would my patch be ok if I used '0xff' instead of '-1'?

With version 3.3.6 (gcc) there's quite a big difference in the
assembly code (between 'for' and 'memset').

INGO:

0xff might be better, but i'm still uneasy about it ... No other piece
of code within the kernel does this.

could you post the patch and the reasoning to
linux-kernel@vger.kernel.org as well? That way others can chime in as
well.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Want comments regarding patch

2006-12-28 Thread Daniel Marjamäki

Hello!

Thank you for your comments. It seems to me the issue was the readability.

It was my goal to improve the readability. I failed.

I personally prefer to use standard functions instead of writing code.
In my opinion using standard functions means less code that is easier to read.

Best regards,
Daniel
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Want comments regarding patch

2006-12-28 Thread Daniel Marjamäki

Hello all!

I sent a patch with this content:

-   for (i = 0; i  MAX_PIRQS; i++)
-   pirq_entries[i] = -1;
+   memset(pirq_entries, -1, sizeof(pirq_entries));

I'd like to know if you have any comments to this change. It was of
course my intention to make the code shorter, simpler and faster.

I've discussed this with Ingo Molnar and here's our conversation:

INGO:

hm, i'm not sure i like this - the '-1' in the memset is for a byte,
while the pirq_entries are word sized. It should work because the bytes
happen to be 0xff for the word too, but this is encodes an assumption,
and were we ever to change that value it could break silently. gcc ought
to be able to figure out the best way to initialize the array.


DANIEL:

Thank you for the comments.

I understand your point, it's good. But I personally still like my
method better.
For me -1 is just as valid as an argument as 0. As you note however,
it assumes that the next developer understands the encoding of
negative numbers. A developer who doesn't know the encoding could be
very confused. Would my patch be ok if I used '0xff' instead of '-1'?

With version 3.3.6 (gcc) there's quite a big difference in the
assembly code (between 'for' and 'memset').

INGO:

0xff might be better, but i'm still uneasy about it ... No other piece
of code within the kernel does this.

could you post the patch and the reasoning to
linux-kernel@vger.kernel.org as well? That way others can chime in as
well.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/