cobol: Support National characters and Unicode runtime encoding. The last few months have seen an evolution in the COBOL compiler. Up until now it could use either CP1252/ASCII or CP1140/EBCDIC to represent alphanumeric variables and numeric types that are stored as character strings. With these changes, those types can be represented in many other single-byte encodings, as well as UTF16 and UTF32 encodings.
These changes required extensive changes. 1) The initial parsing has to handle the extended capabilities. 2) Each run-time variable designates its character set. 3) The run-time code has to be able to handle wide characters. Since the development took place over a period of time, other changes crept in. In particular, there is an expansion of bindings making certain POSIX functions available to the COBOL programmer. There has also been an expansion of gcobol's use of the GCC diagnostic framework. Co-Authored-By: Robert Dubner mailto:[email protected] Co-Authored-By: James K. Lowden mailto:[email protected] gcc/cobol/ChangeLog: * cbldiag.h (struct cbl_loc_t): Diagnostics. (enum cbl_diag_id_t): Diagnostics. * cdf.y: Includes. * cobol1.cc (cobol_warning_suppress): Diagnostics. (cobol_langhook_handle_option): Implement -fexec-charset. Expand the use of diagnostics. * gcobc: Expand options and warnings. * gcobol.1: Documentation. * genapi.cc (level_88_helper): Charsets. (get_level_88_domain): Charsets. (get_class_condition_string): Charsets. (function_pointer_from_name): Charsets. (initialize_variable_internal): Charsets. (parser_initialize): Charsets. (get_binary_value_from_float): Charsets. (get_bytes_needed): Charsets. (cobol_compare): Charsets. (move_tree): Eliminate function. (move_tree_to_field): Eliminate function. (get_string_from): Eliminate function. (parser_init_list): Charsets. (psa_FldLiteralN): Charsets. (parser_accept_date_yymmdd): Charsets. (parser_accept_date_yyyymmdd): Charsets. (parser_accept_date_yyddd): Charsets. (parser_accept_date_yyyyddd): Charsets. (parser_accept_date_dow): Charsets. (parser_accept_date_hhmmssff): Charsets. (parser_alphabet): Charsets. (parser_alphabet_use): Charsets. (parser_display_internal): Charsets. (get_literalN_value): Charsets. (tree_type_from_field_type): Charsets. (program_end_stuff): Charsets. (walk_initialization): Charsets. (parser_xml_parse): Charsets. (initialize_the_data): Charsets. (establish_using): Charsets. (parser_setop): Charsets. (parser_set_conditional88): Charsets. (parser_file_add): Charsets. (get_the_filename): Eliminate function. (parser_file_open): Charsets. (parser_file_delete_file): Charsets. (parser_file_start): Charsets. (parser_module_name): Charsets. (parser_intrinsic_find_string): New function. (parser_intrinsic_numval_c): Charsets. (parser_intrinsic_convert): New function. (parser_intrinsic_call_1): Charsets. (create_and_call): Charsets. (mh_identical): Charsets. (mh_source_is_literalN): Charsets. (float_type_of): Charsets. (mh_dest_is_float): Charsets. (mh_numeric_display): Charsets. (mh_little_endian): Charsets. (mh_source_is_group): Charsets. (mh_source_is_literalA): Charsets. (move_helper): Charsets. (binary_initial): Eliminate function. (digits_from_int128): Eliminate function. (digits_from_float128): Eliminate function. (initial_from_initial): Eliminate function. (convert_data_initial): New function. (actually_create_the_static_field): Charsets. (psa_new_var_decl): Charsets. (psa_FldLiteralA): Charsets. (parser_local_add): Charsets. (parser_symbol_add): Charsets. * genapi.h (parser_intrinsic_convert): New function. (parser_intrinsic_find_string): New function. * genmath.cc (arithmetic_operation): Charsets. (largest_binary_term): Charsets. (fast_add): Charsets. (fast_subtract): Charsets. (fast_multiply): Charsets. (fast_divide): Charsets. (parser_subtract): Fix subtract float from float. * genutil.cc (get_any_capacity): Charsets. (get_and_check_refstart_and_reflen): Charsets. (get_data_offset): Charsets. (get_binary_value): Charsets. (tree_type_from_field): Charsets. (copy_little_endian_into_place): Charsets. (get_literal_string): Charsets. (refer_is_clean): Charsets. (refer_fill_depends): Charsets. (refer_size_source): Comment. * lang-specs.h: Charsets. * lang.opt: Charsets. * lexio.cc (parse_copy_directive): Diagnostics. * messages.cc (cbl_diagnostic_kind): Diagnostics. (cobol_warning_suppress): Diagnostics. * parse.y: Many changes for charsets and diagnostics. * parse_ante.h (MAXLENGTH_FORMATTED_DATE): Charsets. (MAXLENGTH_FORMATTED_TIME): Charsets. (MAXLENGTH_CALENDAR_DATE): Charsets. (MAXLENGTH_FORMATTED_DATETIME): Charsets. (consistent_encoding_check): Charsets. (enum data_clause_t): Charsets. (new_alphanumeric): Charsets. (name_of): Charsets. (class eval_subject_t): Charsets. (struct domain_t): Charsets. (struct file_list_t): Charsets. (current_encoding): Charsets. (new_tempnumeric): Charsets. (is_integer_literal): Charsets. (new_literal): Charsets. (new_constant): Charsets. (conditional_set): Charsets. (field_find): Charsets. (valid_redefine): Charsets. (field_value_all): Charsets. (parent_has_picture): Charsets. (parent_has_value): Charsets. (blank_pad_initial): Charsets. (blankit): Charsets. (cbl_field_t::blank_initial): Charsets. (value_encoding_check): Charsets. (cbl_field_t::set_initial): Charsets. (field_alloc): Charsets. (parser_move_carefully): Charsets. (data_division_ready): Charsets. (anybody_redefines): Charsets. (procedure_division_ready): Charsets. (file_section_parent_set): Charsets. (field_binary_usage): Charsets. (goodnight_gracie): Formatting. * scan.l: Charsets. * scan_ante.h (numstr_of): Charsets. (typed_name): Charsets. * show_parse.h: Charsets. * structs.cc (create_cblc_file_t): Charsets. * symbols.cc (symbol_table_extend): Charsets. (WARNING_FIELD): Diagnostics. (constq): Charsets. (elementize): Charsets. (field_size): Charsets. (cbl_field_t::set_attr): Eliminate run-time component. (cbl_field_t::clear_attr): Eliminate run-time component. (field_memsize): Charsets. (cbl_encoding_str): Charsets. (symbols_dump): Charsets. (is_variable_length): Formatting. (field_str): Charsets. (extend_66_capacity): Charsets. (operator<<): Charsets. (symbols_update): Charsets. (symbol_field_parent_set): Charsets. (symbol_table_init): Charsets. (numeric_group_attrs): Charsets. (symbol_field_add): Charsets. (symbol_field_alias): Charsets. (fd_record_size_cmp): Charsets. (symbol_file_record_sizes): Charsets. (cbl_alphabet_t::reencode): Charsets. (symbol_temporary_location): Charsets. (new_literal_2): Charsets. (new_alphanumeric): Charsets. (standard_internal): Charsets. (cbl_field_t::codeset_t::stride): Charsets. (cobol_alpha_encoding): Charsets. (cobol_national_encoding): Charsets. (new_temporary): Charsets. (new_literal_float): Charsets. (cbl_field_t::is_ascii): Charsets. (cbl_field_t::internalize): Eliminate function. (cbl_field_t::source_code_check): Charsets. (iconv_cd): Charsets. (cbl_field_t::encode): New function for charsets. (cbl_field_t::set_capacity): Charsets. (cbl_field_t::add_capacity): Charsets. (cbl_field_t::char_capacity): Charsets. (symbol_label_section_exists): Charsets. (size): Charsets. (validate_numeric_edited): Charsets. * symbols.h (cobol_alpha_encoding): Charsets. (cobol_national_encoding): Charsets. (consistent_encoding_check): Charsets. (class cbl_domain_elem_t): Charsets. (struct cbl_domain_t): Charsets. (struct cbl_field_data_t): Charsets. (class cbl_field_data_t): Charsets. (struct cbl_subtable_t): Charsets. (struct cbl_field_t): Charsets. (new_literal_float): Charsets. (new_temporary): Charsets. (new_literal_2): Charsets. (symbol_temporary_location): Charsets. (class temporaries_t): Charsets. (struct symbol_elem_t): Charsets. (symbol_elem_of): Charsets. (symbol_unique_index): Charsets. (cbl_field_type_name): Charsets. (validate_numeric_edited): Charsets. * token_names.h: Charsets. * util.cc (cdf_literalize): Charsets. (cbl_field_type_name): Charsets. (determine_intermediate_type): Charsets. (is_alpha_edited): Charsets. (cbl_field_data_t::is_alpha_edited): Charsets. (symbol_field_type_update): Charsets. (redefine_field): Charsets. (FIXED_WIDE_INT): Charsets. (dirty_to_binary): Charsets. (digits_from_int128): Charsets. (binary_initial): Charsets. (cbl_field_t::encode_numeric): Charsets. (FOR_JIM): Temporary conditional demonstration code. (parse_error_inc): Diagnostics. (parse_error_count): Diagnostics. (cbl_field_t::report_invalid_initial_value): Diagnostics. (valid_move): Diagnostics. (type_capacity): Charsets. (symbol_unique_index): New function. (cbl_unimplementedw): Formatting. libgcobol/ChangeLog: * charmaps.cc (__gg__encoding_iconv_name): Charsets. (__gg__encoding_iconv_valid): Charsets. (__gg__encoding_iconv_type): Charsets. (encoding_descr): Charsets. (__gg__encoding_iconv_descr): Charsets. (__gg__iconverter): Charsets. (__gg__miconverter): Charsets. * charmaps.h (NOT_A_CHARACTER): Charsets. (ascii_nul): Charsets. (ascii_bang): Charsets. (__gg__encoding_iconv_type): Charsets. (__gg__iconverter): Charsets. (__gg__miconverter): Charsets. (DEFAULT_32_ENCODING): Charsets. (class charmap_t): Charsets. (__gg__get_charmap): Charsets. * common-defs.h (enum cbl_field_attr_t): (enum cbl_figconst_t): Formatting. (LOW_VALUE_E): Handle enum arithmetic. (ZERO_VALUE_E): Handle enum arithmetic. (SPACE_VALUE_E): Handle enum arithmetic. (QUOTE_VALUE_E): Handle enum arithmetic. (HIGH_VALUE_E): Handle enum arithmetic. (enum convert_type_t): Enum for new FUNCTION CONVERT. (struct cbl_declarative_t): Formatting. * encodings.h (struct encodings_t): Charsets. * gcobolio.h: Charsets. * gfileio.cc (get_filename): Rename to establish filename. (establish_filename): Renamed from get_filename. (relative_file_delete): Charsets. (__io__file_remove): Moved. (trim_in_place): Charsets. (relative_file_start): Charsets. (relative_file_rewrite): Charsets. (relative_file_write): Charsets. (sequential_file_write): Charsets. (line_sequential_file_read): Charsets. (sequential_file_read): Charsets. (relative_file_read): Charsets. (__gg__file_reopen): Charsets. (__io__file_open): Charsets. (__io__file_close): Charsets. (gcobol_fileops): Charsets. (__gg__file_open): Charsets. (__gg__file_remove): Charsets. * gfileio.h (__gg__file_open): Charsets. * gmath.cc (__gg__subtractf1_float_phase2): Comment. (__gg__subtractf2_float_phase1): Comment. (__gg__multiplyf1_phase2): Comment. * intrinsic.cc (is_zulu_format): Charsets. (string_to_dest): Charsets. (get_all_time): Charsets. (ftime_replace): Charsets. (__gg__char): Charsets. (__gg__current_date): Charsets. (__gg__formatted_current_date): Charsets. (__gg__formatted_date): Charsets. (__gg__formatted_datetime): Charsets. (__gg__formatted_time): Charsets. (change_case): Charsets. (__gg__upper_case): Charsets. (numval): Charsets. (numval_c): Charsets. (__gg__trim): Charsets. (__gg__reverse): Charsets. (fill_cobol_tm): Charsets. (__gg__seconds_from_formatted_time): Charsets. (__gg__hex_of): Charsets. (__gg__numval_f): Charsets. (__gg__test_numval_f): Charsets. (__gg__locale_date): Charsets. (__gg__locale_time): Charsets. (__gg__locale_time_from_seconds): Charsets. * libgcobol.cc (NO_RDIGITS): Alias for (0). (__gg__move): Forward reference. (struct program_state): Charsets. (cstrncmp): Charsets. (__gg__init_program_state): Charsets. (edited_to_binary): Charsets. (var_is_refmod): Comment. (__gg__power_of_ten): Reworked data initialization. (__gg__scale_by_power_of_ten_1): Likewise. (__gg__scale_by_power_of_ten_2): Likewise. (value_is_too_big): Likewise. (binary_to_big_endian): Likewise. (binary_to_little_endian): Likewise. (int128_to_int128_rounded): Likewise. (get_binary_value_local): Likewise. (get_init_value): Likewise. (f128_to_i128_rounded): Likewise. (__gg__initialization_values): Likewise. (int128_to_field): Likewise. (__gg__get_date_yymmdd): Charsets. (__gg__field_from_string): Charsets. (field_from_ascii): Charsets. (__gg__get_date_yyyymmdd): Charsets. (__gg__get_date_yyddd): Charsets. (__gg__get_yyyyddd): Charsets. (__gg__get_date_dow): Charsets. (__gg__get_date_hhmmssff): Charsets. (collation_position): Charsets. (uber_compare): Charsets. (__gg__dirty_to_binary): Charsets. (__gg__dirty_to_float): Charsets. (format_for_display_internal): Charsets. (compare_88): Charsets. (get_float128): Reworked. (compare_field_class): Charsets. (interconvert): Charsets. (compare_strings): Charsets. (__gg__compare_2): Charsets. (compare_two_records): Charsets. (__gg__sort_table): Charsets. (init_var_both): Charsets. (__gg__initialize_variable_clean): Charsets. (alpha_to_alpha_move_from_location): Charsets. (__gg__memdup): New function. (alpha_to_alpha_move): Charsets. (__gg__sort_workfile): Charsets. (__gg__merge_files): Charsets. (funky_find_wide): Charsets. (funky_find_wide_backward): Charsets. (normalize_id): Charsets. (match_lengths): Charsets. (the_alpha_and_omega): Charsets. (the_alpha_and_omega_backward): Charsets. (inspect_backward_format_1): Charsets. (__gg__inspect_format_1): Charsets. (inspect_backward_format_2): Charsets. (__gg__inspect_format_2): Charsets. (normalize_for_inspect_format_4): Charsets. (__gg__inspect_format_4): Charsets. (move_string): Charsets. (brute_force_trim): Charsets. (__gg__string): Charsets. (display_both): Charsets. (__gg__display_string): Charsets. (__gg__bitwise_op): Charsets. (is_numeric_display_numeric): Charsets. (is_alpha_a_number): Charsets. (classify_numeric_type): Charsets. (classify_alphabetic_type): Charsets. (__gg__classify): Charsets. (__gg__convert_encoding): Charsets. (accept_envar): Charsets. (__gg__accept_envar): Charsets. (__gg__get_argc): Charsets. (__gg__get_argv): Charsets. (__gg__get_command_line): Charsets. (__gg__parser_set_conditional): Charsets. (__gg__literaln_alpha_compare): Charsets. (string_in): Charsets. (__gg__unstring): Charsets. (__gg__integer_from_float128): Charsets. (__gg__adjust_dest_size): Charsets. (__gg__just_mangle_name): Charsets. (__gg__function_handle_from_name): Charsets. (get_the_byte): Charsets. (__gg__refer_from_string): Charsets. (__gg__refer_from_psz): Charsets. (__gg__find_string): Charsets. (convert_for_convert): Charsets. (__gg__convert): Charsets. * libgcobol.h (__gg__compare_2): Charsets. (__gg__field_from_string): Charsets. (__gg__memdup): Charsets. * posix/bin/Makefile: Posix bindings. * posix/bin/scrape.awk: Posix bindings. * posix/bin/udf-gen: Posix bindings. * posix/udf/posix-lseek.cbl: Posix bindings. * posix/udf/posix-unlink.cbl: Posix bindings. * stringbin.cc (__gg__binary_to_string_encoded): Charsets. (__gg__numeric_display_to_binary): Charsets. * stringbin.h (__gg__binary_to_string_encoded): Charsets. * valconv.cc (__gg__string_to_numeric_edited): Charsets. * posix/cpy/psx-lseek.cpy: New file. * posix/shim/lseek.cc: New file. gcc/testsuite/ChangeLog: * cobol.dg/group2/CHAR_and_ORD_with_COLLATING_sequence_-_EBCDIC.cob: Change diagnostics message. * cobol.dg/group2/Multi-target_MOVE_with_subscript_re-evaluation.cob: Change diagnostics message. * cobol.dg/group2/floating-point_SUBTRACT_FORMAT_2.out: Change diagnostics message. * cobol.dg/group2/floating-point_literals.out: Change diagnostics message.
