On 5/24/24 2:41 AM, Mariam Arutunian wrote:
Add two new internal functions (IFN_CRC, IFN_CRC_REV), to provide faster
CRC generation.
One performs bit-forward and the other bit-reversed CRC computation.
If CRC optabs are supported, they are used for the CRC computation.
Otherwise, table-based CRC is generated.
The supported data and CRC sizes are 8, 16, 32, and 64 bits.
The polynomial is without the leading 1.
A table with 256 elements is used to store precomputed CRCs.
For the reflection of inputs and the output, a simple algorithm involving
SHIFT, AND, and OR operations is used.
Co-authored-by: Joern Rennecke <joern.renne...@embecosm.com
<mailto:joern.renne...@embecosm.com>>
gcc/
* doc/md.texi (crc@var{m}@var{n}4,
crc_rev@var{m}@var{n}4): Document.
* expr.cc (generate_crc_table): New function.
(calculate_table_based_CRC): Likewise.
(expand_crc_table_based): Likewise.
(gen_common_operation_to_reflect): Likewise.
(reflect_64_bit_value): Likewise.
(reflect_32_bit_value): Likewise.
(reflect_16_bit_value): Likewise.
(reflect_8_bit_value): Likewise.
(generate_reflecting_code_standard): Likewise.
(expand_reversed_crc_table_based): Likewise.
* expr.h (generate_reflecting_code_standard): New function declaration.
(expand_crc_table_based): Likewise.
(expand_reversed_crc_table_based): Likewise.
* internal-fn.cc: (crc_direct): Define.
(direct_crc_optab_supported_p): Likewise.
(expand_crc_optab_fn): New function
* internal-fn.def (CRC, CRC_REV): New internal functions.
* optabs.def (crc_optab, crc_rev_optab): New optabs.
Signed-off-by: Mariam Arutunian <mariamarutun...@gmail.com
<mailto:mariamarutun...@gmail.com>>
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 5730bda80dc..be68ef860f9 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8557,6 +8557,20 @@ operand 2, greater than operand 2 or is unordered with
operand 2.
This pattern is not allowed to @code{FAIL}.
+@cindex @code{crc@var{m}@var{n}4} instruction pattern
+@item @samp{crc@var{m}@var{n}4}
+Calculate a bit-forward CRC using operands 1, 2 and 3,
+then store the result in operand 0.
+Operands 1 is the initial CRC, operands 2 is the data and operands 3 is the
+polynomial without leading 1.
+Operands 0, 1 and 3 have mode @var{n} and operand 2 has mode @var{m}, where
+both modes are integers. The size of CRC to be calculated is determined by the
+mode; for example, if @var{n} is 'hi', a CRC16 is calculated.
+
+@cindex @code{crc_rev@var{m}@var{n}4} instruction pattern
+@item @samp{crc_rev@var{m}@var{n}4}
+Similar to @samp{crc@var{m}@var{n}4}, but calculates a bit-reversed CRC.
+
So just to be clear, this is a case where the input (operand 2) may have
a different mode than the output (operand 0). That scenario is
generally discouraged, with a few exceptions (the most common being
shift counts which are often QImode objects while the
value-to-be-shifted and the output value are potentially any scalar
integer mode.
So I don't think this is a problem, just wanted to point it out to
anyone else that may be looking at this code.
@end table
@end ifset
diff --git a/gcc/expr.cc b/gcc/expr.cc
index 1baa39b98eb..18368ae6b6c 100644
--- a/gcc/expr.cc
+++ b/gcc/expr.cc
@@ -14091,3 +14091,359 @@ int_expr_size (const_tree exp)
return tree_to_shwi (size);
}
+
+/* Calculate CRC for the initial CRC and given POLYNOMIAL.
+ CRC_BITS is CRC size. */
+
+static unsigned HOST_WIDE_INT
+calculate_crc (unsigned HOST_WIDE_INT crc,
+ unsigned HOST_WIDE_INT polynomial,
+ unsigned crc_bits)
Just a nit. Line up the polynomial & crc_bits declarations with the crc
declaration.
+{
+ crc = crc << (crc_bits - 8);
+ for (int i = 8; i > 0; --i)
+ {
+ if ((crc >> (crc_bits - 1)) & 1)
+ crc = (crc << 1) ^ polynomial;
+ else
+ crc <<= 1;
+ }
+
+ crc <<= (sizeof (crc) * BITS_PER_UNIT - crc_bits);
+ crc >>= (sizeof (crc) * BITS_PER_UNIT - crc_bits);
Another nit. Just once space after the <<= or >>= operators.
+
+ return crc;
+}
+
+/* Assemble CRC table with 256 elements for the given POLYNOM and CRC_BITS with
+ given ID.
+ ID is the identifier of the table, the name of the table is unique,
+ contains CRC size and the polynomial.
+ POLYNOM is the polynomial used to calculate the CRC table's elements.
+ CRC_BITS is the size of CRC, may be 8, 16, ... . */
+
+rtx
+assemble_crc_table (tree id, unsigned HOST_WIDE_INT polynom, unsigned crc_bits)
+{
+ unsigned table_el_n = 0x100;
+ tree ar = build_array_type (make_unsigned_type (crc_bits),
+ build_index_type (size_int (table_el_n - 1)));
Nit. Line up build_index_type at the same indention as make_unsigned_type.
Note that with TREE_READONLY set, there is at least some chance that the
linker will find identical tables and merge them. I haven't tested
this, but I know it happens for other objects in the constant pools.
+ sprintf (buf, "crc_table_for_crc_%u_polynomial_" HOST_WIDE_INT_PRINT_HEX,
+ crc_bits, polynom);
Another formatting nit. Line up the arguments when you need to wrap a
function call. ie
foo (arg1, arg2....
arg3, arg4....)
+
+/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
+ POLYNOMIAL (without leading 1).
+
+ First, using POLYNOMIAL's value generates CRC table of 256 elements,
+ then generates the assembly for the following code,
+ where crc_size and data_size may be 8, 16, 32, 64, depending on CRC:
+
+ for (int i = 0; i < data_size / 8; i++)
+ crc = (crc << 8) ^ crc_table[(crc >> (crc_size - 8))
+ ^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
+
+ So to take values from the table, we need 8-bit data.
+ If input data size is not 8, then first we extract upper 8 bits,
+ then the other 8 bits, and so on. */
+
+void
+calculate_table_based_CRC (rtx *crc, const rtx &input_data,
+ const rtx &polynomial,
+ machine_mode crc_mode, machine_mode data_mode)
Same nit as before. Line up the parameters. It looks like we need to
check for that throughout this function when you wrapped function
argument lists.
+
+/* Generate table-based CRC code for the given CRC, INPUT_DATA and the
+ POLYNOMIAL (without leading 1).
+
+ CRC is OP1, data is OP2 and the polynomial is OP3.
+ This must generate a CRC table and an assembly for the following code,
+ where crc_size and data_size may be 8, 16, 32, 64:
+ uint_crc_size_t
+ crc_crc_size (uint_crc_size_t crc_init, uint_data_size_t data, size_t size)
+ {
+ uint_crc_size_t crc = crc_init;
+ for (int i = 0; i < data_size / 8; i++)
+ crc = (crc << 8)
+ ^ crc_table[(crc >> (crc_size - 8))
+ ^ (data >> (data_size - (i + 1) * 8) & 0xFF))];
+ return crc;
+ } */
+
+void
+expand_crc_table_based (rtx op0, rtx op1, rtx op2, rtx op3,
+ machine_mode data_mode)
+{
+ gcc_assert (!CONST_INT_P (op0));
+ gcc_assert (CONST_INT_P (op3));
+ machine_mode crc_mode = GET_MODE (op0);
+ rtx crc = gen_reg_rtx (word_mode);
+ convert_move (crc, op1, 0);
+ calculate_table_based_CRC (&crc, op2, op3, crc_mode, data_mode);
+ if (crc_mode == SImode && word_mode == DImode)
+ {
+ rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
+ crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
+ }
+ rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
The explicit checks for SI/DI mode don't look right here. What I think
you're looking for is crc_mode < word_mode. If you wanted to be even
omre precise you could check the sizes of the two modes.
Along the same lines, I don't think you want/need the call to
simplify_gen_subreg when word_mode == crc_mode. ISTM that you only want
the subreg when crc_mode < word_mode.
Presumably we rejected the optimization earlier if crc_mode > word_mode?
+ emit_move_insn (tgt, crc);
+}
+
+/* Generate the common operation for reflecting values:
+ *OP = (*OP & AND1_VALUE) << SHIFT_VAL | (*OP & AND2_VALUE) >> SHIFT_VAL; */
+
+void
+gen_common_operation_to_reflect (rtx *op,
+ unsigned HOST_WIDE_INT and1_value,
+ unsigned HOST_WIDE_INT and2_value,
+ unsigned shift_val)
+{
+ rtx op1 = expand_and (word_mode, *op, gen_int_mode (and1_value, word_mode),
+ NULL_RTX);
+ op1 = expand_shift (LSHIFT_EXPR, word_mode, op1, shift_val, op1, 0);
+ rtx op2 = expand_and (word_mode, *op, gen_int_mode (and2_value, word_mode),
+ NULL_RTX);
+ op2 = expand_shift (RSHIFT_EXPR, word_mode, op2, shift_val, op2, 1);
+ *op = expand_binop (word_mode, ior_optab, op1, op2, *op, 0, OPTAB_DIRECT);
+}
Same nits as elsewhere. Line up the parameters & arguments when you
need to wrap lines.
+
+/* Reflect 64-bit value for the 64-bit target. */
+
+void
+reflect_64_bit_value (rtx *op)
+{
+ gen_common_operation_to_reflect (op, 0x00000000FFFFFFFF, 0xFFFFFFFF00000000,
+ 32);
+ gen_common_operation_to_reflect (op, 0x0000FFFF0000FFFF, 0xFFFF0000FFFF0000,
+ 16);
+ gen_common_operation_to_reflect (op, 0x00FF00FF00FF00FF, 0xFF00FF00FF00FF00,
+ 8);
+ gen_common_operation_to_reflect (op, 0x0F0F0F0F0F0F0F0F, 0xF0F0F0F0F0F0F0F0,
+ 4);
+ gen_common_operation_to_reflect (op, 0x3333333333333333, 0xCCCCCCCCCCCCCCCC,
+ 2);
+ gen_common_operation_to_reflect (op, 0x5555555555555555, 0xAAAAAAAAAAAAAAAA,
+ 1);
Another class of nits. In general, if you're wrapping a long line like
these above, consider "balancing" the length of the lines a bit better.
so instead of bringing down just the final constant, bring down the
final two constants to a new line. I think that's what we generally do
elsehwere.
+
+ gen_reflecting_code (&crc, GET_MODE_BITSIZE (word_mode) - crc_size);
+ if (crc_mode == SImode && word_mode == DImode)
+ {
+ rtx a_low = gen_lowpart_SUBREG (crc_mode, crc);
+ crc = gen_rtx_SIGN_EXTEND (word_mode, a_low);
+ }
+ rtx tgt = simplify_gen_subreg (word_mode, op0, crc_mode, 0);
+ emit_move_insn (tgt, crc);
Same concerns as in the bit-forward implementation WRT testing SI/DI
explicitly. Testing crc_mode < word_mode or testing their bitsizes
seems safer.
+/* Expand CRC call STMT. */
+
+static void
+expand_crc_optab_fn (internal_fn fn, gcall *stmt, convert_optab optab)
+{
+ tree lhs = gimple_call_lhs (stmt);
+ tree rhs1 = gimple_call_arg (stmt, 0); // crc
+ tree rhs2 = gimple_call_arg (stmt, 1); // data
+ tree rhs3 = gimple_call_arg (stmt, 2); // polynomial
+
+ tree result_type = TREE_TYPE (lhs);
+ tree data_type = TREE_TYPE (rhs2);
+
+ gcc_assert (TYPE_MODE (result_type) >= TYPE_MODE (data_type));
+ gcc_assert (word_mode >= TYPE_MODE (result_type));
+
+ rtx dest = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+ rtx crc = expand_normal (rhs1);
+ rtx data = expand_normal (rhs2);
+ gcc_assert (TREE_CODE (rhs3) == INTEGER_CST);
+ rtx polynomial = gen_rtx_CONST_INT (TYPE_MODE (result_type),
+ TREE_INT_CST_LOW (rhs3));
Nit. Looks like this code got over-indented. Just two spaces of
indention after the open-curly.
In general, most of the concerns with this patch are formatting nits.
The functional concerns are with those mode checks as mentioned in a
couple places above. Repost once you've got an update fixing these issues.
Thanks,
Jeff