[llvm-branch-commits] [llvm] [AArch64][PAC] Reduce the size of synchronous CFI (PR #96377)

2024-06-22 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin created 
https://github.com/llvm/llvm-project/pull/96377

For synchronous unwind tables, the call frame information can be slightly 
reduced by bundling the `.cfi_negate_ra_state` instruction with other CFI 
instructions in the prolog, saving 1 byte per function used for 
`DW_CFA_advance_loc`.

This was suggested in [D156428](https://reviews.llvm.org/D156428#4554317).

>From 4880bc9fca58a185f70acf00a8c31891184272cd Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Thu, 20 Jun 2024 18:53:45 -0700
Subject: [PATCH] [AArch64][PAC] Reduce the size of synchronous CFI

For synchronous unwind tables, the call frame information can be
slightly reduced by bundling the `.cfi_negate_ra_state` instruction
with other CFI instructions in the prolog, saving 1 byte per function
used for `DW_CFA_advance_loc`.

This was suggested in [D156428](https://reviews.llvm.org/D156428#4554317).
---
 .../lib/Target/AArch64/AArch64PointerAuth.cpp | 13 +
 .../machine-outliner-retaddr-sign-cfi.ll  |  3 +-
 ...tliner-retaddr-sign-diff-scope-same-key.ll |  6 ++--
 .../machine-outliner-retaddr-sign-non-leaf.ll |  9 --
 .../machine-outliner-retaddr-sign-regsave.mir |  3 +-
 ...tliner-retaddr-sign-same-scope-diff-key.ll |  9 --
 ...machine-outliner-retaddr-sign-subtarget.ll |  9 --
 .../machine-outliner-retaddr-sign-thunk.ll| 12 +---
 .../AArch64/pacbti-llvm-generated-funcs-2.ll  |  9 --
 ...sign-return-address-cfi-negate-ra-state.ll | 13 +
 .../AArch64/sign-return-address-pauth-lr.ll   | 28 +--
 .../CodeGen/AArch64/sign-return-address.ll| 18 ++--
 12 files changed, 84 insertions(+), 48 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp 
b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
index e900f6881620f..eb0ff73200407 100644
--- a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
@@ -100,6 +100,7 @@ void AArch64PointerAuth::signLR(MachineFunction ,
   auto  = *MF.getInfo();
   bool UseBKey = MFnI.shouldSignWithBKey();
   bool EmitCFI = MFnI.needsDwarfUnwindInfo(MF);
+  bool EmitAsyncCFI = MFnI.needsAsyncDwarfUnwindInfo(MF);
   bool NeedsWinCFI = MF.hasWinCFI();
 
   MachineBasicBlock  = *MBBI->getParent();
@@ -137,6 +138,18 @@ void AArch64PointerAuth::signLR(MachineFunction ,
   }
 
   if (EmitCFI) {
+if (!EmitAsyncCFI) {
+  // Reduce the size of the generated call frame information for 
synchronous
+  // CFI by bundling the new CFI instruction with others in the prolog, so
+  // that no additional DW_CFA_advance_loc is needed.
+  for (auto I = MBBI; I != MBB.end(); ++I) {
+if (I->getOpcode() == TargetOpcode::CFI_INSTRUCTION &&
+I->getFlag(MachineInstr::FrameSetup)) {
+  MBBI = I;
+  break;
+}
+  }
+}
 unsigned CFIIndex =
 MF.addFrameInst(MCCFIInstruction::createNegateRAState(nullptr));
 BuildMI(MBB, MBBI, DL, TII->get(TargetOpcode::CFI_INSTRUCTION))
diff --git a/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-cfi.ll 
b/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-cfi.ll
index 4bbbe40176313..c64b3842aa5ba 100644
--- a/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-cfi.ll
+++ b/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-cfi.ll
@@ -11,7 +11,8 @@ define void @a() "sign-return-address"="all" 
"sign-return-address-key"="b_key" {
 ; CHECK-NEXT:  .cfi_b_key_frame
 ; V8A-NEXT:hint #27
 ; V83A-NEXT:   pacibsp
-; CHECK-NEXT:  .cfi_negate_ra_state
+; CHECK:   .cfi_negate_ra_state
+; CHECK-NEXT:  .cfi_def_cfa_offset
   %1 = alloca i32, align 4
   %2 = alloca i32, align 4
   %3 = alloca i32, align 4
diff --git 
a/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-diff-scope-same-key.ll
 
b/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-diff-scope-same-key.ll
index f4e9c0a4c2204..3221815da33c5 100644
--- 
a/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-diff-scope-same-key.ll
+++ 
b/llvm/test/CodeGen/AArch64/machine-outliner-retaddr-sign-diff-scope-same-key.ll
@@ -7,7 +7,8 @@ define void @a() "sign-return-address"="all" {
 ; CHECK-LABEL:  a: // @a
 ; V8A:  hint #25
 ; V83A: paciasp
-; CHECK-NEXT:   .cfi_negate_ra_state
+; CHECK:.cfi_negate_ra_state
+; CHECK-NEXT:   .cfi_def_cfa_offset
   %1 = alloca i32, align 4
   %2 = alloca i32, align 4
   %3 = alloca i32, align 4
@@ -54,7 +55,8 @@ define void @c() "sign-return-address"="all" {
 ; CHECK-LABEL: c:  // @c
 ; V8A: hint #25
 ; V83A:paciasp
-; CHECK-NEXT:  .cfi_negate_ra_state
+; CHECK:  .cfi_negate_ra_state
+; CHECK-NEXT: .cfi_def_cfa_offset
   %1 = alloca i32, align 4
   %2 = alloca i32, align 4
   %3 = alloca i32, align 4
diff --git 

[llvm-branch-commits] [llvm] [AArch64][PAC] Fix creating check instructions for BBs without an epilog (PR #92508)

2024-05-17 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin created 
https://github.com/llvm/llvm-project/pull/92508

`AArch64PAuth::checkAuthenticatedRegister()` splits the basic block containing 
the tail call instruction to add check instructions, assuming at least one more 
instruction before the call. This assumption is incorrect in cases where some 
execution paths lead to the termination block without creating the stack frame. 
This patch rearranges the creation of the checks so that the prior splitting is 
not required.

>From a3039508f7bf9eeacbb4739460468cb3e71ba133 Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Thu, 16 May 2024 22:26:32 -0700
Subject: [PATCH 1/2] test

---
 .../AArch64/sign-return-address-tailcall.ll   | 32 +++
 1 file changed, 32 insertions(+)

diff --git a/llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll 
b/llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll
index cf033cb8208cc..0cc707298e458 100644
--- a/llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll
+++ b/llvm/test/CodeGen/AArch64/sign-return-address-tailcall.ll
@@ -129,4 +129,36 @@ define i32 @tailcall_ib_key() "sign-return-address"="all" 
"sign-return-address-k
   ret i32 %call
 }
 
+define i32 @tailcall_two_branches(i1 %0) "sign-return-address"="all" {
+; COMMON-LABEL:tailcall_two_branches:
+; COMMON:tbz w0, #0, .[[ELSE:LBB[_0-9]+]]
+; COMMON:str x30, [sp, #-16]!
+; COMMON:bl callee2
+; COMMON:ldr x30, [sp], #16
+; COMMON-NEXT:   [[AUTIASP]]
+; COMMON-NEXT: .[[ELSE]]:
+
+; LDR-NEXT:  ldr w16, [x30]
+;
+; BITS-NOTBI-NEXT:   eor x16, x30, x30, lsl #1
+; BITS-NOTBI-NEXT:   tbnz x16, #62, .[[FAIL:LBB[_0-9]+]]
+;
+; XPAC-NEXT: mov x16, x30
+; XPAC-NEXT: [[XPACLRI]]
+; XPAC-NEXT: cmp x16, x30
+; XPAC-NEXT: b.ne .[[FAIL:LBB[_0-9]+]]
+;
+; COMMON-NEXT:   b callee
+; BRK-NEXT:.[[FAIL]]:
+; BRK-NEXT:  brk #0xc470
+  br i1 %0, label %2, label %3
+2:
+  call void @callee2()
+  br label %3
+3:
+  %call = tail call i32 @callee()
+  ret i32 %call
+}
+
 declare i32 @callee()
+declare void @callee2()

>From 2641fe82837455b422d6c8229cc2f3d3736de4da Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Thu, 16 May 2024 22:26:40 -0700
Subject: [PATCH 2/2] [AArch64][PAC] Fix creating check instructions for BBs
 without an epilog

`AArch64PAuth::checkAuthenticatedRegister()` splits the basic block
containing the tail call instruction to add check instructions, assuming
at least one more instruction before the call. This assumption is
incorrect in cases where some execution paths lead to the termination
block without creating the stack frame. This patch rearranges the
creation of the checks so that the prior splitting is not required.
---
 .../lib/Target/AArch64/AArch64PointerAuth.cpp | 23 ++-
 1 file changed, 7 insertions(+), 16 deletions(-)

diff --git a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp 
b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
index 90bf089dbebf7..60d3d533d9c10 100644
--- a/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
+++ b/llvm/lib/Target/AArch64/AArch64PointerAuth.cpp
@@ -257,21 +257,12 @@ void llvm::AArch64PAuth::checkAuthenticatedRegister(
 
   // Control flow has to be changed, so arrange new MBBs.
 
-  // At now, at least an AUT* instruction is expected before MBBI
-  assert(MBBI != MBB.begin() &&
- "Cannot insert the check at the very beginning of MBB");
-  // The block to insert check into.
-  MachineBasicBlock *CheckBlock = 
-  // The remaining part of the original MBB that is executed on success.
-  MachineBasicBlock *SuccessBlock = MBB.splitAt(*std::prev(MBBI));
-
   // The block that explicitly generates a break-point exception on failure.
   MachineBasicBlock *BreakBlock =
   MF.CreateMachineBasicBlock(MBB.getBasicBlock());
   MF.push_back(BreakBlock);
-  MBB.splitSuccessor(SuccessBlock, BreakBlock);
+  MBB.addSuccessor(BreakBlock);
 
-  assert(CheckBlock->getFallThrough() == SuccessBlock);
   BuildMI(BreakBlock, DL, TII->get(AArch64::BRK)).addImm(BrkImm);
 
   switch (Method) {
@@ -279,11 +270,11 @@ void llvm::AArch64PAuth::checkAuthenticatedRegister(
   case AuthCheckMethod::DummyLoad:
 llvm_unreachable("Should be handled above");
   case AuthCheckMethod::HighBitsNoTBI:
-BuildMI(CheckBlock, DL, TII->get(AArch64::EORXrs), TmpReg)
+BuildMI(MBB, MBBI, DL, TII->get(AArch64::EORXrs), TmpReg)
 .addReg(AuthenticatedReg)
 .addReg(AuthenticatedReg)
 .addImm(1);
-BuildMI(CheckBlock, DL, TII->get(AArch64::TBNZX))
+BuildMI(MBB, MBBI, DL, TII->get(AArch64::TBNZX))
 .addReg(TmpReg)
 .addImm(62)
 .addMBB(BreakBlock);
@@ -292,16 +283,16 @@ void llvm::AArch64PAuth::checkAuthenticatedRegister(
 assert(AuthenticatedReg == AArch64::LR &&
"XPACHint mode is only compatible with checking the LR register");
 assert(UseIKey && "XPACHint mode is only compatible with I-keys");
-

[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin updated 
https://github.com/llvm/llvm-project/pull/70898

>From f38dc24c2dd940e18eb424746d13cd99e3ffdd91 Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Tue, 7 Nov 2023 18:42:02 -0800
Subject: [PATCH] [YAMLParser] Unfold multi-line scalar values

Long scalar values can be split into multiple lines to improve
readability. The rules are described in Section 6.5. "Line Folding",
https://yaml.org/spec/1.2.2/#65-line-folding. In addition, for flow
scalar styles, the Spec states that "All leading and trailing white
space characters on each line are excluded from the content",
https://yaml.org/spec/1.2.2/#73-flow-scalar-styles.

The patch implements these unfolding rules for double-quoted,
single-quoted, and plain scalars.
---
 llvm/include/llvm/Support/YAMLParser.h|   9 +-
 llvm/lib/Support/YAMLParser.cpp   | 373 --
 llvm/test/YAMLParser/spec-05-13.test  |   2 +-
 llvm/test/YAMLParser/spec-05-14.test  |   2 +-
 llvm/test/YAMLParser/spec-09-01.test  |   4 +-
 llvm/test/YAMLParser/spec-09-02.test  |  18 +-
 llvm/test/YAMLParser/spec-09-03.test  |   6 +-
 llvm/test/YAMLParser/spec-09-04.test  |   2 +-
 llvm/test/YAMLParser/spec-09-05.test  |   6 +-
 llvm/test/YAMLParser/spec-09-07.test  |   4 +-
 llvm/test/YAMLParser/spec-09-08.test  |   8 +-
 llvm/test/YAMLParser/spec-09-09.test  |   6 +-
 llvm/test/YAMLParser/spec-09-10.test  |   2 +-
 llvm/test/YAMLParser/spec-09-11.test  |   4 +-
 llvm/test/YAMLParser/spec-09-13.test  |   4 +-
 llvm/test/YAMLParser/spec-09-16.test  |   8 +-
 llvm/test/YAMLParser/spec-09-17.test  |   2 +-
 llvm/test/YAMLParser/spec-10-02.test  |   6 +-
 llvm/test/YAMLParser/spec1.2-07-05.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-06.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-09.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-12.test   |   2 +-
 llvm/unittests/Support/YAMLParserTest.cpp | 102 ++
 23 files changed, 376 insertions(+), 200 deletions(-)

diff --git a/llvm/include/llvm/Support/YAMLParser.h 
b/llvm/include/llvm/Support/YAMLParser.h
index f4767641647c217..9d95a1e13a0dff4 100644
--- a/llvm/include/llvm/Support/YAMLParser.h
+++ b/llvm/include/llvm/Support/YAMLParser.h
@@ -240,9 +240,14 @@ class ScalarNode final : public Node {
 private:
   StringRef Value;
 
-  StringRef unescapeDoubleQuoted(StringRef UnquotedValue,
- StringRef::size_type Start,
+  StringRef getDoubleQuotedValue(StringRef UnquotedValue,
  SmallVectorImpl ) const;
+
+  static StringRef getSingleQuotedValue(StringRef RawValue,
+SmallVectorImpl );
+
+  static StringRef getPlainValue(StringRef RawValue,
+ SmallVectorImpl );
 };
 
 /// A block scalar node is an opaque datum that can be presented as a
diff --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index b47cb3ae3b44a75..fdd0ed6e682eb5e 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2030,184 +2030,229 @@ bool Node::failed() const {
 }
 
 StringRef ScalarNode::getValue(SmallVectorImpl ) const {
-  // TODO: Handle newlines properly. We need to remove leading whitespace.
-  if (Value[0] == '"') { // Double quoted.
-// Pull off the leading and trailing "s.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-// Search for characters that would require unescaping the value.
-StringRef::size_type i = UnquotedValue.find_first_of("\\\r\n");
-if (i != StringRef::npos)
-  return unescapeDoubleQuoted(UnquotedValue, i, Storage);
+  if (Value[0] == '"')
+return getDoubleQuotedValue(Value, Storage);
+  if (Value[0] == '\'')
+return getSingleQuotedValue(Value, Storage);
+  return getPlainValue(Value, Storage);
+}
+
+/// parseScalarValue - A common parsing routine for all flow scalar styles.
+/// It handles line break characters by itself, adds regular content characters
+/// to the result, and forwards escaped sequences to the provided routine for
+/// the style-specific processing.
+///
+/// \param UnquotedValue - An input value without quotation marks.
+/// \param Storage - A storage for the result if the input value is multiline 
or
+/// contains escaped characters.
+/// \param LookupChars - A set of special characters to search in the input
+/// string. Should include line break characters and the escape character
+/// specific for the processing scalar style, if any.
+/// \param UnescapeCallback - This is called when the escape character is found
+/// in the input.
+/// \returns - The unfolded and unescaped value.
+static StringRef
+parseScalarValue(StringRef UnquotedValue, SmallVectorImpl ,
+ StringRef LookupChars,
+ std::function &)>
+ UnescapeCallback) {
+  size_t I = UnquotedValue.find_first_of(LookupChars);
+  if (I == StringRef::npos)
 

[llvm-branch-commits] [llvm] b9b9c49 - [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2023-11-09T16:12:49-08:00
New Revision: b9b9c49c018a28c46d7709ed3b8c8fcb53036f8f

URL: 
https://github.com/llvm/llvm-project/commit/b9b9c49c018a28c46d7709ed3b8c8fcb53036f8f
DIFF: 
https://github.com/llvm/llvm-project/commit/b9b9c49c018a28c46d7709ed3b8c8fcb53036f8f.diff

LOG: [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

Leading white spaces on the line following an escaped line break should
be excluded from the content.
See https://yaml.org/spec/1.2.2/#731-double-quoted-style.

Added: 


Modified: 
llvm/lib/Support/YAMLParser.cpp
llvm/test/YAMLParser/spec-09-02.test
llvm/test/YAMLParser/spec-09-04.test
llvm/test/YAMLParser/spec1.2-07-05.test

Removed: 




diff  --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index 17d727b6cc07da8..b47cb3ae3b44a75 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2107,14 +2107,13 @@ StringRef ScalarNode::unescapeDoubleQuoted( StringRef 
UnquotedValue
   return "";
 }
   case '\r':
+// Shrink the Windows-style EOL.
+if (UnquotedValue.size() >= 2 && UnquotedValue[1] == '\n')
+  UnquotedValue = UnquotedValue.drop_front(1);
+[[fallthrough]];
   case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
+UnquotedValue = UnquotedValue.drop_front(1).ltrim(" \t");
+continue;
   case '0':
 Storage.push_back(0x00);
 break;

diff  --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 6b68a00e3fc3e6f..51ea61dd23273d3 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
-# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t\n none"
 
 ## Note: The example was originally taken from Spec 1.1, but the parsing rules
 ## have been changed since then.

diff  --git a/llvm/test/YAMLParser/spec-09-04.test 
b/llvm/test/YAMLParser/spec-09-04.test
index 1e904eaa70992e5..e4f77ea83c7ac5f 100644
--- a/llvm/test/YAMLParser/spec-09-04.test
+++ b/llvm/test/YAMLParser/spec-09-04.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "first\n \tinner 1\t\n  inner 2  last"
+# CHECK: "first\n \tinner 1\t\n  inner 2 last"
 
  "first
inner 1 

diff  --git a/llvm/test/YAMLParser/spec1.2-07-05.test 
b/llvm/test/YAMLParser/spec1.2-07-05.test
index 3ea0e5aa37743e4..f923f68d04295f9 100644
--- a/llvm/test/YAMLParser/spec1.2-07-05.test
+++ b/llvm/test/YAMLParser/spec1.2-07-05.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t  \tnon-content"
+# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t \tnon-content"
 
 "folded 
 to a space,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin updated 
https://github.com/llvm/llvm-project/pull/70898

>From 37ab3fff62b1a3aa373fd513745b1c2b91b1b865 Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Tue, 7 Nov 2023 18:42:02 -0800
Subject: [PATCH] [YAMLParser] Unfold multi-line scalar values

Long scalar values can be split into multiple lines to improve
readability. The rules are described in Section 6.5. "Line Folding",
https://yaml.org/spec/1.2.2/#65-line-folding. In addition, for flow
scalar styles, the Spec states that "All leading and trailing white
space characters on each line are excluded from the content",
https://yaml.org/spec/1.2.2/#73-flow-scalar-styles.

The patch implements these unfolding rules for double-quoted,
single-quoted, and plain scalars.
---
 llvm/include/llvm/Support/YAMLParser.h|   9 +-
 llvm/lib/Support/YAMLParser.cpp   | 373 --
 llvm/test/YAMLParser/spec-05-13.test  |   2 +-
 llvm/test/YAMLParser/spec-05-14.test  |   2 +-
 llvm/test/YAMLParser/spec-09-01.test  |   4 +-
 llvm/test/YAMLParser/spec-09-02.test  |  18 +-
 llvm/test/YAMLParser/spec-09-03.test  |   6 +-
 llvm/test/YAMLParser/spec-09-04.test  |   2 +-
 llvm/test/YAMLParser/spec-09-05.test  |   6 +-
 llvm/test/YAMLParser/spec-09-07.test  |   4 +-
 llvm/test/YAMLParser/spec-09-08.test  |   8 +-
 llvm/test/YAMLParser/spec-09-09.test  |   6 +-
 llvm/test/YAMLParser/spec-09-10.test  |   2 +-
 llvm/test/YAMLParser/spec-09-11.test  |   4 +-
 llvm/test/YAMLParser/spec-09-13.test  |   4 +-
 llvm/test/YAMLParser/spec-09-16.test  |   8 +-
 llvm/test/YAMLParser/spec-09-17.test  |   2 +-
 llvm/test/YAMLParser/spec-10-02.test  |   6 +-
 llvm/test/YAMLParser/spec1.2-07-05.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-06.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-09.test   |   2 +-
 llvm/test/YAMLParser/spec1.2-07-12.test   |   2 +-
 llvm/unittests/Support/YAMLParserTest.cpp | 102 ++
 23 files changed, 376 insertions(+), 200 deletions(-)

diff --git a/llvm/include/llvm/Support/YAMLParser.h 
b/llvm/include/llvm/Support/YAMLParser.h
index f4767641647c217..9d95a1e13a0dff4 100644
--- a/llvm/include/llvm/Support/YAMLParser.h
+++ b/llvm/include/llvm/Support/YAMLParser.h
@@ -240,9 +240,14 @@ class ScalarNode final : public Node {
 private:
   StringRef Value;
 
-  StringRef unescapeDoubleQuoted(StringRef UnquotedValue,
- StringRef::size_type Start,
+  StringRef getDoubleQuotedValue(StringRef UnquotedValue,
  SmallVectorImpl ) const;
+
+  static StringRef getSingleQuotedValue(StringRef RawValue,
+SmallVectorImpl );
+
+  static StringRef getPlainValue(StringRef RawValue,
+ SmallVectorImpl );
 };
 
 /// A block scalar node is an opaque datum that can be presented as a
diff --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index b47cb3ae3b44a75..fdd0ed6e682eb5e 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2030,184 +2030,229 @@ bool Node::failed() const {
 }
 
 StringRef ScalarNode::getValue(SmallVectorImpl ) const {
-  // TODO: Handle newlines properly. We need to remove leading whitespace.
-  if (Value[0] == '"') { // Double quoted.
-// Pull off the leading and trailing "s.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-// Search for characters that would require unescaping the value.
-StringRef::size_type i = UnquotedValue.find_first_of("\\\r\n");
-if (i != StringRef::npos)
-  return unescapeDoubleQuoted(UnquotedValue, i, Storage);
+  if (Value[0] == '"')
+return getDoubleQuotedValue(Value, Storage);
+  if (Value[0] == '\'')
+return getSingleQuotedValue(Value, Storage);
+  return getPlainValue(Value, Storage);
+}
+
+/// parseScalarValue - A common parsing routine for all flow scalar styles.
+/// It handles line break characters by itself, adds regular content characters
+/// to the result, and forwards escaped sequences to the provided routine for
+/// the style-specific processing.
+///
+/// \param UnquotedValue - An input value without quotation marks.
+/// \param Storage - A storage for the result if the input value is multiline 
or
+/// contains escaped characters.
+/// \param LookupChars - A set of special characters to search in the input
+/// string. Should include line break characters and the escape character
+/// specific for the processing scalar style, if any.
+/// \param UnescapeCallback - This is called when the escape character is found
+/// in the input.
+/// \returns - The unfolded and unescaped value.
+static StringRef
+parseScalarValue(StringRef UnquotedValue, SmallVectorImpl ,
+ StringRef LookupChars,
+ std::function &)>
+ UnescapeCallback) {
+  size_t I = UnquotedValue.find_first_of(LookupChars);
+  if (I == StringRef::npos)
 

[llvm-branch-commits] [llvm] b4e19d2 - [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2023-11-09T13:51:04-08:00
New Revision: b4e19d2f0531c99167e3391f3742729c731d9c34

URL: 
https://github.com/llvm/llvm-project/commit/b4e19d2f0531c99167e3391f3742729c731d9c34
DIFF: 
https://github.com/llvm/llvm-project/commit/b4e19d2f0531c99167e3391f3742729c731d9c34.diff

LOG: [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

Leading white spaces on the line following an escaped line break should
be excluded from the content.
See https://yaml.org/spec/1.2.2/#731-double-quoted-style.

Added: 


Modified: 
llvm/lib/Support/YAMLParser.cpp
llvm/test/YAMLParser/spec-09-02.test
llvm/test/YAMLParser/spec-09-04.test
llvm/test/YAMLParser/spec1.2-07-05.test

Removed: 




diff  --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index 17d727b6cc07da8..b47cb3ae3b44a75 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2107,14 +2107,13 @@ StringRef ScalarNode::unescapeDoubleQuoted( StringRef 
UnquotedValue
   return "";
 }
   case '\r':
+// Shrink the Windows-style EOL.
+if (UnquotedValue.size() >= 2 && UnquotedValue[1] == '\n')
+  UnquotedValue = UnquotedValue.drop_front(1);
+[[fallthrough]];
   case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
+UnquotedValue = UnquotedValue.drop_front(1).ltrim(" \t");
+continue;
   case '0':
 Storage.push_back(0x00);
 break;

diff  --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 6b68a00e3fc3e6f..51ea61dd23273d3 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
-# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t\n none"
 
 ## Note: The example was originally taken from Spec 1.1, but the parsing rules
 ## have been changed since then.

diff  --git a/llvm/test/YAMLParser/spec-09-04.test 
b/llvm/test/YAMLParser/spec-09-04.test
index 1e904eaa70992e5..e4f77ea83c7ac5f 100644
--- a/llvm/test/YAMLParser/spec-09-04.test
+++ b/llvm/test/YAMLParser/spec-09-04.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "first\n \tinner 1\t\n  inner 2  last"
+# CHECK: "first\n \tinner 1\t\n  inner 2 last"
 
  "first
inner 1 

diff  --git a/llvm/test/YAMLParser/spec1.2-07-05.test 
b/llvm/test/YAMLParser/spec1.2-07-05.test
index 3ea0e5aa37743e4..f923f68d04295f9 100644
--- a/llvm/test/YAMLParser/spec1.2-07-05.test
+++ b/llvm/test/YAMLParser/spec1.2-07-05.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t  \tnon-content"
+# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t \tnon-content"
 
 "folded 
 to a space,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [YAMLParser] Fix handling escaped line breaks in double-quoted scalars (PR #71775)

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin updated 
https://github.com/llvm/llvm-project/pull/71775

>From b4e19d2f0531c99167e3391f3742729c731d9c34 Mon Sep 17 00:00:00 2001
From: Igor Kudrin 
Date: Wed, 8 Nov 2023 20:48:49 -0800
Subject: [PATCH] [YAMLParser] Fix handling escaped line breaks in
 double-quoted scalars

Leading white spaces on the line following an escaped line break should
be excluded from the content.
See https://yaml.org/spec/1.2.2/#731-double-quoted-style.
---
 llvm/lib/Support/YAMLParser.cpp | 13 ++---
 llvm/test/YAMLParser/spec-09-02.test|  2 +-
 llvm/test/YAMLParser/spec-09-04.test|  2 +-
 llvm/test/YAMLParser/spec1.2-07-05.test |  2 +-
 4 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index 17d727b6cc07da8..b47cb3ae3b44a75 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2107,14 +2107,13 @@ StringRef ScalarNode::unescapeDoubleQuoted( StringRef 
UnquotedValue
   return "";
 }
   case '\r':
+// Shrink the Windows-style EOL.
+if (UnquotedValue.size() >= 2 && UnquotedValue[1] == '\n')
+  UnquotedValue = UnquotedValue.drop_front(1);
+[[fallthrough]];
   case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
+UnquotedValue = UnquotedValue.drop_front(1).ltrim(" \t");
+continue;
   case '0':
 Storage.push_back(0x00);
 break;
diff --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 6b68a00e3fc3e6f..51ea61dd23273d3 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
-# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t\n none"
 
 ## Note: The example was originally taken from Spec 1.1, but the parsing rules
 ## have been changed since then.
diff --git a/llvm/test/YAMLParser/spec-09-04.test 
b/llvm/test/YAMLParser/spec-09-04.test
index 1e904eaa70992e5..e4f77ea83c7ac5f 100644
--- a/llvm/test/YAMLParser/spec-09-04.test
+++ b/llvm/test/YAMLParser/spec-09-04.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "first\n \tinner 1\t\n  inner 2  last"
+# CHECK: "first\n \tinner 1\t\n  inner 2 last"
 
  "first
inner 1 
diff --git a/llvm/test/YAMLParser/spec1.2-07-05.test 
b/llvm/test/YAMLParser/spec1.2-07-05.test
index 3ea0e5aa37743e4..f923f68d04295f9 100644
--- a/llvm/test/YAMLParser/spec1.2-07-05.test
+++ b/llvm/test/YAMLParser/spec1.2-07-05.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t  \tnon-content"
+# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t \tnon-content"
 
 "folded 
 to a space,

___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 9a6f97c - [YAMLParser] Enable tests for flow scalar styles. NFC

2023-11-09 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2023-11-09T13:48:06-08:00
New Revision: 9a6f97c327be5a5380c29295a6f73a1ec81ca41d

URL: 
https://github.com/llvm/llvm-project/commit/9a6f97c327be5a5380c29295a6f73a1ec81ca41d
DIFF: 
https://github.com/llvm/llvm-project/commit/9a6f97c327be5a5380c29295a6f73a1ec81ca41d.diff

LOG: [YAMLParser] Enable tests for flow scalar styles. NFC

This is a preparing commit for #70898 and #71775. It activates checks in
tests for single-quoted, double-quoted, and plain values and
demonstrates how they are handled currently.

Added: 
llvm/test/YAMLParser/spec1.2-07-05.test
llvm/test/YAMLParser/spec1.2-07-06.test
llvm/test/YAMLParser/spec1.2-07-09.test
llvm/test/YAMLParser/spec1.2-07-12.test

Modified: 
llvm/test/YAMLParser/spec-02-17.test
llvm/test/YAMLParser/spec-05-13.test
llvm/test/YAMLParser/spec-05-14.test
llvm/test/YAMLParser/spec-09-01.test
llvm/test/YAMLParser/spec-09-02.test
llvm/test/YAMLParser/spec-09-03.test
llvm/test/YAMLParser/spec-09-04.test
llvm/test/YAMLParser/spec-09-05.test
llvm/test/YAMLParser/spec-09-06.test
llvm/test/YAMLParser/spec-09-07.test
llvm/test/YAMLParser/spec-09-08.test
llvm/test/YAMLParser/spec-09-09.test
llvm/test/YAMLParser/spec-09-10.test
llvm/test/YAMLParser/spec-09-11.test
llvm/test/YAMLParser/spec-09-13.test
llvm/test/YAMLParser/spec-09-16.test
llvm/test/YAMLParser/spec-09-17.test
llvm/test/YAMLParser/spec-10-02.test

Removed: 




diff  --git a/llvm/test/YAMLParser/spec-02-17.test 
b/llvm/test/YAMLParser/spec-02-17.test
index 2bcb60c8d933bd8..e7b0147a0fcd89f 100644
--- a/llvm/test/YAMLParser/spec-02-17.test
+++ b/llvm/test/YAMLParser/spec-02-17.test
@@ -1,4 +1,4 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s
 
 unicode: "Sosa did fine.\u263A"
 control: "\b1998\t1999\t2000\n"

diff  --git a/llvm/test/YAMLParser/spec-05-13.test 
b/llvm/test/YAMLParser/spec-05-13.test
index db62e866a755a32..e7ec42a4aaa80d7 100644
--- a/llvm/test/YAMLParser/spec-05-13.test
+++ b/llvm/test/YAMLParser/spec-05-13.test
@@ -1,4 +1,5 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
+# CHECK: "Text containing   \n  both space and\t\n  \ttab\tcharacters"
 
   "Text containing   
   both space and   

diff  --git a/llvm/test/YAMLParser/spec-05-14.test 
b/llvm/test/YAMLParser/spec-05-14.test
index 65451651b69e96b..984f3721312ab63 100644
--- a/llvm/test/YAMLParser/spec-05-14.test
+++ b/llvm/test/YAMLParser/spec-05-14.test
@@ -1,4 +1,4 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
 
 "Fun with \\
 \" \a \b \e \f \

diff  --git a/llvm/test/YAMLParser/spec-09-01.test 
b/llvm/test/YAMLParser/spec-09-01.test
index 8999b4961626470..2b5a6f31166ddf1 100644
--- a/llvm/test/YAMLParser/spec-09-01.test
+++ b/llvm/test/YAMLParser/spec-09-01.test
@@ -1,4 +1,13 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
+# CHECK:  !!map {
+# CHECK-NEXT:   ? !!str "simple key"
+# CHECK-NEXT:   : !!map {
+# CHECK-NEXT: ? !!str "also simple"
+# CHECK-NEXT: : !!str "value",
+# CHECK-NEXT: ? !!str "not a\n  simple key"
+# CHECK-NEXT: : !!str "any\n  value",
+# CHECK-NEXT:   },
+# CHECK-NEXT: }
 
 "simple key" : {
   "also simple" : value,

diff  --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 3f8e49a8bd31079..6b68a00e3fc3e6f 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,14 +1,17 @@
-# RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s
+# RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
 
- "as space
- trimmed
+## Note: The example was originally taken from Spec 1.1, but the parsing rules
+## have been changed since then.
+## * The paragraph-separator character '\u2029' is excluded from line-break
+##   characters, so the original sequence "escaped\t\\\u2029" is no longer
+##   considered valid. This is replaced by "escaped\t\\\n" in the test source.
+## See https://yaml.org/spec/1.2.2/ext/changes/ for details.
 
- specific
+ "as space
+ trimmed 
 
+ specific

  escaped   \
+ 
  none"
-
-# FIXME: The string below should actually be
-#   "as space trimmed\nspecific\nescaped\tnone", but the parser currently has
-#   a bug when parsing multiline quoted strings.
-# CHECK: !!str "as space\n trimmed\n specific\n escaped\t none"

diff  --git a/llvm/test/YAMLParser/spec-09-03.test 
b/llvm/test/YAMLParser/spec-09-03.test
index 3fb0d8b184abb16..c656058b7ff8b3e 100644
--- a/llvm/test/YAMLParser/spec-09-03.test
+++ b/llvm/test/YAMLParser/spec-09-03.test
@@ -1,4 +1,9 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s 

[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits

igorkudrin wrote:

> I don't mean to make existing debt your problem, but if it isn't too much 
> work could you post a pre-patch that just adds the `FileCheck`s to the 
> existing tests where the behavior changes, so the test diff is more 
> self-documenting?

* Added #71774 for the tests
* Also added a unittest `YAMLParser.UnfoldsScalarValue` to check various 
combinations of line breaks and other characters. It seems like a `gtest`-based 
test suits better than a lit one.

https://github.com/llvm/llvm-project/pull/70898
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits


@@ -2030,187 +2030,219 @@ bool Node::failed() const {
 }
 
 StringRef ScalarNode::getValue(SmallVectorImpl ) const {
-  // TODO: Handle newlines properly. We need to remove leading whitespace.
-  if (Value[0] == '"') { // Double quoted.
-// Pull off the leading and trailing "s.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-// Search for characters that would require unescaping the value.
-StringRef::size_type i = UnquotedValue.find_first_of("\\\r\n");
-if (i != StringRef::npos)
-  return unescapeDoubleQuoted(UnquotedValue, i, Storage);
+  if (Value[0] == '"')
+return getDoubleQuotedValue(Value, Storage);
+  if (Value[0] == '\'')
+return getSingleQuotedValue(Value, Storage);
+  return getPlainValue(Value, Storage);
+}
+
+static StringRef
+parseScalarValue(StringRef UnquotedValue, SmallVectorImpl ,
+ StringRef LookupChars,

igorkudrin wrote:

Added a description for the function and its arguments.

https://github.com/llvm/llvm-project/pull/70898
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits


@@ -2030,187 +2030,219 @@ bool Node::failed() const {
 }
 
 StringRef ScalarNode::getValue(SmallVectorImpl ) const {
-  // TODO: Handle newlines properly. We need to remove leading whitespace.
-  if (Value[0] == '"') { // Double quoted.
-// Pull off the leading and trailing "s.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-// Search for characters that would require unescaping the value.
-StringRef::size_type i = UnquotedValue.find_first_of("\\\r\n");
-if (i != StringRef::npos)
-  return unescapeDoubleQuoted(UnquotedValue, i, Storage);
+  if (Value[0] == '"')
+return getDoubleQuotedValue(Value, Storage);
+  if (Value[0] == '\'')
+return getSingleQuotedValue(Value, Storage);
+  return getPlainValue(Value, Storage);
+}
+
+static StringRef
+parseScalarValue(StringRef UnquotedValue, SmallVectorImpl ,
+ StringRef LookupChars,
+ std::function &)>
+ UnescapeCallback) {
+  size_t I = UnquotedValue.find_first_of(LookupChars);
+  if (I == StringRef::npos)
 return UnquotedValue;
-  } else if (Value[0] == '\'') { // Single quoted.
-// Pull off the leading and trailing 's.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-StringRef::size_type i = UnquotedValue.find('\'');
-if (i != StringRef::npos) {
-  // We're going to need Storage.
-  Storage.clear();
-  Storage.reserve(UnquotedValue.size());
-  for (; i != StringRef::npos; i = UnquotedValue.find('\'')) {
-StringRef Valid(UnquotedValue.begin(), i);
-llvm::append_range(Storage, Valid);
-Storage.push_back('\'');
-UnquotedValue = UnquotedValue.substr(i + 2);
-  }
-  llvm::append_range(Storage, UnquotedValue);
-  return StringRef(Storage.begin(), Storage.size());
-}
-return UnquotedValue;
-  }
-  // Plain.
-  // Trim whitespace ('b-char' and 's-white').
-  // NOTE: Alternatively we could change the scanner to not include whitespace
-  //   here in the first place.
-  return Value.rtrim("\x0A\x0D\x20\x09");
-}
 
-StringRef ScalarNode::unescapeDoubleQuoted( StringRef UnquotedValue
-  , StringRef::size_type i
-  , SmallVectorImpl )
-  const {
-  // Use Storage to build proper value.
   Storage.clear();
   Storage.reserve(UnquotedValue.size());
-  for (; i != StringRef::npos; i = UnquotedValue.find_first_of("\\\r\n")) {
-// Insert all previous chars into Storage.
-StringRef Valid(UnquotedValue.begin(), i);
-llvm::append_range(Storage, Valid);
-// Chop off inserted chars.
-UnquotedValue = UnquotedValue.substr(i);
-
-assert(!UnquotedValue.empty() && "Can't be empty!");
-
-// Parse escape or line break.
-switch (UnquotedValue[0]) {
-case '\r':
-case '\n':
-  Storage.push_back('\n');
-  if (   UnquotedValue.size() > 1
-  && (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-UnquotedValue = UnquotedValue.substr(1);
-  UnquotedValue = UnquotedValue.substr(1);
-  break;
-default:
-  if (UnquotedValue.size() == 1) {
-Token T;
-T.Range = StringRef(UnquotedValue.begin(), 1);
-setError("Unrecognized escape code", T);
-return "";
-  }
-  UnquotedValue = UnquotedValue.substr(1);
-  switch (UnquotedValue[0]) {
-  default: {
-  Token T;
-  T.Range = StringRef(UnquotedValue.begin(), 1);
-  setError("Unrecognized escape code", T);
-  return "";
-}
-  case '\r':
-  case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
-  case '0':
-Storage.push_back(0x00);
-break;
-  case 'a':
-Storage.push_back(0x07);
-break;
-  case 'b':
-Storage.push_back(0x08);
-break;
-  case 't':
-  case 0x09:
-Storage.push_back(0x09);
-break;
-  case 'n':
-Storage.push_back(0x0A);
-break;
-  case 'v':
-Storage.push_back(0x0B);
-break;
-  case 'f':
-Storage.push_back(0x0C);
-break;
-  case 'r':
-Storage.push_back(0x0D);
-break;
-  case 'e':
-Storage.push_back(0x1B);
-break;
+  char LastNewLineAddedAs = '\0';
+  for (; I != StringRef::npos; I = UnquotedValue.find_first_of(LookupChars)) {
+if (UnquotedValue[I] != '\x0D' && UnquotedValue[I] != '\x0A') {

igorkudrin wrote:

It was an idea to be a bit closer to the spec, where all special characters are 
defined by their value. I changed them back to mnemonics in the last update.


[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits


@@ -2030,187 +2030,219 @@ bool Node::failed() const {
 }
 
 StringRef ScalarNode::getValue(SmallVectorImpl ) const {
-  // TODO: Handle newlines properly. We need to remove leading whitespace.
-  if (Value[0] == '"') { // Double quoted.
-// Pull off the leading and trailing "s.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-// Search for characters that would require unescaping the value.
-StringRef::size_type i = UnquotedValue.find_first_of("\\\r\n");
-if (i != StringRef::npos)
-  return unescapeDoubleQuoted(UnquotedValue, i, Storage);
+  if (Value[0] == '"')
+return getDoubleQuotedValue(Value, Storage);
+  if (Value[0] == '\'')
+return getSingleQuotedValue(Value, Storage);
+  return getPlainValue(Value, Storage);
+}
+
+static StringRef
+parseScalarValue(StringRef UnquotedValue, SmallVectorImpl ,
+ StringRef LookupChars,
+ std::function &)>
+ UnescapeCallback) {
+  size_t I = UnquotedValue.find_first_of(LookupChars);
+  if (I == StringRef::npos)
 return UnquotedValue;
-  } else if (Value[0] == '\'') { // Single quoted.
-// Pull off the leading and trailing 's.
-StringRef UnquotedValue = Value.substr(1, Value.size() - 2);
-StringRef::size_type i = UnquotedValue.find('\'');
-if (i != StringRef::npos) {
-  // We're going to need Storage.
-  Storage.clear();
-  Storage.reserve(UnquotedValue.size());
-  for (; i != StringRef::npos; i = UnquotedValue.find('\'')) {
-StringRef Valid(UnquotedValue.begin(), i);
-llvm::append_range(Storage, Valid);
-Storage.push_back('\'');
-UnquotedValue = UnquotedValue.substr(i + 2);
-  }
-  llvm::append_range(Storage, UnquotedValue);
-  return StringRef(Storage.begin(), Storage.size());
-}
-return UnquotedValue;
-  }
-  // Plain.
-  // Trim whitespace ('b-char' and 's-white').
-  // NOTE: Alternatively we could change the scanner to not include whitespace
-  //   here in the first place.
-  return Value.rtrim("\x0A\x0D\x20\x09");
-}
 
-StringRef ScalarNode::unescapeDoubleQuoted( StringRef UnquotedValue
-  , StringRef::size_type i
-  , SmallVectorImpl )
-  const {
-  // Use Storage to build proper value.
   Storage.clear();
   Storage.reserve(UnquotedValue.size());
-  for (; i != StringRef::npos; i = UnquotedValue.find_first_of("\\\r\n")) {
-// Insert all previous chars into Storage.
-StringRef Valid(UnquotedValue.begin(), i);
-llvm::append_range(Storage, Valid);
-// Chop off inserted chars.
-UnquotedValue = UnquotedValue.substr(i);
-
-assert(!UnquotedValue.empty() && "Can't be empty!");
-
-// Parse escape or line break.
-switch (UnquotedValue[0]) {
-case '\r':
-case '\n':
-  Storage.push_back('\n');
-  if (   UnquotedValue.size() > 1
-  && (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-UnquotedValue = UnquotedValue.substr(1);
-  UnquotedValue = UnquotedValue.substr(1);
-  break;
-default:
-  if (UnquotedValue.size() == 1) {
-Token T;
-T.Range = StringRef(UnquotedValue.begin(), 1);
-setError("Unrecognized escape code", T);
-return "";
-  }
-  UnquotedValue = UnquotedValue.substr(1);
-  switch (UnquotedValue[0]) {
-  default: {
-  Token T;
-  T.Range = StringRef(UnquotedValue.begin(), 1);
-  setError("Unrecognized escape code", T);
-  return "";
-}
-  case '\r':
-  case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
-  case '0':
-Storage.push_back(0x00);
-break;
-  case 'a':
-Storage.push_back(0x07);
-break;
-  case 'b':
-Storage.push_back(0x08);
-break;
-  case 't':
-  case 0x09:
-Storage.push_back(0x09);
-break;
-  case 'n':
-Storage.push_back(0x0A);
-break;
-  case 'v':
-Storage.push_back(0x0B);
-break;
-  case 'f':
-Storage.push_back(0x0C);
-break;
-  case 'r':
-Storage.push_back(0x0D);
-break;
-  case 'e':
-Storage.push_back(0x1B);
-break;
+  char LastNewLineAddedAs = '\0';
+  for (; I != StringRef::npos; I = UnquotedValue.find_first_of(LookupChars)) {
+if (UnquotedValue[I] != '\x0D' && UnquotedValue[I] != '\x0A') {
+  llvm::append_range(Storage, UnquotedValue.take_front(I));
+  UnquotedValue = UnescapeCallback(UnquotedValue.drop_front(I), Storage);
+  LastNewLineAddedAs = '\0';
+  continue;
+}
+if 

[llvm-branch-commits] [llvm] [YAMLParser] Fix handling escaped line breaks in double-quoted scalars (PR #71775)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin edited 
https://github.com/llvm/llvm-project/pull/71775
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] [YAMLParser] Unfold multi-line scalar values (PR #70898)

2023-11-08 Thread Igor Kudrin via llvm-branch-commits

https://github.com/igorkudrin edited 
https://github.com/llvm/llvm-project/pull/70898
___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 69bd4da - [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

2023-11-08 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2023-11-08T21:02:13-08:00
New Revision: 69bd4da46c438ce23ec0773f1d38abee800e6ed4

URL: 
https://github.com/llvm/llvm-project/commit/69bd4da46c438ce23ec0773f1d38abee800e6ed4
DIFF: 
https://github.com/llvm/llvm-project/commit/69bd4da46c438ce23ec0773f1d38abee800e6ed4.diff

LOG: [YAMLParser] Fix handling escaped line breaks in double-quoted scalars

Leading white spaces on the line following an escaped line break should
be excluded from the content.
See https://yaml.org/spec/1.2.2/#731-double-quoted-style.

Added: 


Modified: 
llvm/lib/Support/YAMLParser.cpp
llvm/test/YAMLParser/spec-09-02.test
llvm/test/YAMLParser/spec-09-04.test
llvm/test/YAMLParser/spec1.2-07-05.test

Removed: 




diff  --git a/llvm/lib/Support/YAMLParser.cpp b/llvm/lib/Support/YAMLParser.cpp
index 17d727b6cc07da8..b47cb3ae3b44a75 100644
--- a/llvm/lib/Support/YAMLParser.cpp
+++ b/llvm/lib/Support/YAMLParser.cpp
@@ -2107,14 +2107,13 @@ StringRef ScalarNode::unescapeDoubleQuoted( StringRef 
UnquotedValue
   return "";
 }
   case '\r':
+// Shrink the Windows-style EOL.
+if (UnquotedValue.size() >= 2 && UnquotedValue[1] == '\n')
+  UnquotedValue = UnquotedValue.drop_front(1);
+[[fallthrough]];
   case '\n':
-// Remove the new line.
-if (   UnquotedValue.size() > 1
-&& (UnquotedValue[1] == '\r' || UnquotedValue[1] == '\n'))
-  UnquotedValue = UnquotedValue.substr(1);
-// If this was just a single byte newline, it will get skipped
-// below.
-break;
+UnquotedValue = UnquotedValue.drop_front(1).ltrim(" \t");
+continue;
   case '0':
 Storage.push_back(0x00);
 break;

diff  --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 6b68a00e3fc3e6f..51ea61dd23273d3 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
-# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t\n none"
 
 ## Note: The example was originally taken from Spec 1.1, but the parsing rules
 ## have been changed since then.

diff  --git a/llvm/test/YAMLParser/spec-09-04.test 
b/llvm/test/YAMLParser/spec-09-04.test
index 1e904eaa70992e5..e4f77ea83c7ac5f 100644
--- a/llvm/test/YAMLParser/spec-09-04.test
+++ b/llvm/test/YAMLParser/spec-09-04.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "first\n \tinner 1\t\n  inner 2  last"
+# CHECK: "first\n \tinner 1\t\n  inner 2 last"
 
  "first
inner 1 

diff  --git a/llvm/test/YAMLParser/spec1.2-07-05.test 
b/llvm/test/YAMLParser/spec1.2-07-05.test
index 3ea0e5aa37743e4..f923f68d04295f9 100644
--- a/llvm/test/YAMLParser/spec1.2-07-05.test
+++ b/llvm/test/YAMLParser/spec1.2-07-05.test
@@ -1,5 +1,5 @@
 # RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
-# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t  \tnon-content"
+# CHECK: "folded \nto a space,\t\n \nto a line feed, or \t \tnon-content"
 
 "folded 
 to a space,



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits


[llvm-branch-commits] [llvm] 5784f20 - [YAMLParser] Enable tests for flow scalar styles

2023-11-08 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2023-11-08T19:20:14-08:00
New Revision: 5784f2014981cdd16095e737d1d128a2995a3dbd

URL: 
https://github.com/llvm/llvm-project/commit/5784f2014981cdd16095e737d1d128a2995a3dbd
DIFF: 
https://github.com/llvm/llvm-project/commit/5784f2014981cdd16095e737d1d128a2995a3dbd.diff

LOG: [YAMLParser] Enable tests for flow scalar styles

This is a preparing commit for #70898. It activates checks in tests for
single-quoted, double-quoted, and plain values and demonstrates how they
are handled currently.

Added: 
llvm/test/YAMLParser/spec1.2-07-05.test
llvm/test/YAMLParser/spec1.2-07-06.test
llvm/test/YAMLParser/spec1.2-07-09.test
llvm/test/YAMLParser/spec1.2-07-12.test

Modified: 
llvm/test/YAMLParser/spec-02-17.test
llvm/test/YAMLParser/spec-05-13.test
llvm/test/YAMLParser/spec-05-14.test
llvm/test/YAMLParser/spec-09-01.test
llvm/test/YAMLParser/spec-09-02.test
llvm/test/YAMLParser/spec-09-03.test
llvm/test/YAMLParser/spec-09-04.test
llvm/test/YAMLParser/spec-09-05.test
llvm/test/YAMLParser/spec-09-06.test
llvm/test/YAMLParser/spec-09-07.test
llvm/test/YAMLParser/spec-09-08.test
llvm/test/YAMLParser/spec-09-09.test
llvm/test/YAMLParser/spec-09-10.test
llvm/test/YAMLParser/spec-09-11.test
llvm/test/YAMLParser/spec-09-13.test
llvm/test/YAMLParser/spec-09-16.test
llvm/test/YAMLParser/spec-09-17.test
llvm/test/YAMLParser/spec-10-02.test

Removed: 




diff  --git a/llvm/test/YAMLParser/spec-02-17.test 
b/llvm/test/YAMLParser/spec-02-17.test
index 2bcb60c8d933bd8..e7b0147a0fcd89f 100644
--- a/llvm/test/YAMLParser/spec-02-17.test
+++ b/llvm/test/YAMLParser/spec-02-17.test
@@ -1,4 +1,4 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s
 
 unicode: "Sosa did fine.\u263A"
 control: "\b1998\t1999\t2000\n"

diff  --git a/llvm/test/YAMLParser/spec-05-13.test 
b/llvm/test/YAMLParser/spec-05-13.test
index db62e866a755a32..e7ec42a4aaa80d7 100644
--- a/llvm/test/YAMLParser/spec-05-13.test
+++ b/llvm/test/YAMLParser/spec-05-13.test
@@ -1,4 +1,5 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
+# CHECK: "Text containing   \n  both space and\t\n  \ttab\tcharacters"
 
   "Text containing   
   both space and   

diff  --git a/llvm/test/YAMLParser/spec-05-14.test 
b/llvm/test/YAMLParser/spec-05-14.test
index 65451651b69e96b..984f3721312ab63 100644
--- a/llvm/test/YAMLParser/spec-05-14.test
+++ b/llvm/test/YAMLParser/spec-05-14.test
@@ -1,4 +1,4 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
 
 "Fun with \\
 \" \a \b \e \f \

diff  --git a/llvm/test/YAMLParser/spec-09-01.test 
b/llvm/test/YAMLParser/spec-09-01.test
index 8999b4961626470..2b5a6f31166ddf1 100644
--- a/llvm/test/YAMLParser/spec-09-01.test
+++ b/llvm/test/YAMLParser/spec-09-01.test
@@ -1,4 +1,13 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
+# CHECK:  !!map {
+# CHECK-NEXT:   ? !!str "simple key"
+# CHECK-NEXT:   : !!map {
+# CHECK-NEXT: ? !!str "also simple"
+# CHECK-NEXT: : !!str "value",
+# CHECK-NEXT: ? !!str "not a\n  simple key"
+# CHECK-NEXT: : !!str "any\n  value",
+# CHECK-NEXT:   },
+# CHECK-NEXT: }
 
 "simple key" : {
   "also simple" : value,

diff  --git a/llvm/test/YAMLParser/spec-09-02.test 
b/llvm/test/YAMLParser/spec-09-02.test
index 3f8e49a8bd31079..6b68a00e3fc3e6f 100644
--- a/llvm/test/YAMLParser/spec-09-02.test
+++ b/llvm/test/YAMLParser/spec-09-02.test
@@ -1,14 +1,17 @@
-# RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s
+# RUN: yaml-bench -canonical %s 2>&1 | FileCheck %s --strict-whitespace
+# CHECK: "as space\n trimmed \n specific\L\n escaped\t \n none"
 
- "as space
- trimmed
+## Note: The example was originally taken from Spec 1.1, but the parsing rules
+## have been changed since then.
+## * The paragraph-separator character '\u2029' is excluded from line-break
+##   characters, so the original sequence "escaped\t\\\u2029" is no longer
+##   considered valid. This is replaced by "escaped\t\\\n" in the test source.
+## See https://yaml.org/spec/1.2.2/ext/changes/ for details.
 
- specific
+ "as space
+ trimmed 
 
+ specific

  escaped   \
+ 
  none"
-
-# FIXME: The string below should actually be
-#   "as space trimmed\nspecific\nescaped\tnone", but the parser currently has
-#   a bug when parsing multiline quoted strings.
-# CHECK: !!str "as space\n trimmed\n specific\n escaped\t none"

diff  --git a/llvm/test/YAMLParser/spec-09-03.test 
b/llvm/test/YAMLParser/spec-09-03.test
index 3fb0d8b184abb16..c656058b7ff8b3e 100644
--- a/llvm/test/YAMLParser/spec-09-03.test
+++ b/llvm/test/YAMLParser/spec-09-03.test
@@ -1,4 +1,9 @@
-# RUN: yaml-bench -canonical %s
+# RUN: yaml-bench -canonical %s | FileCheck %s --strict-whitespace
+# 

[llvm-branch-commits] [libcxx] 7803636 - [libcxx testing] Fix UB in tests for std::lock_guard

2021-01-15 Thread Igor Kudrin via llvm-branch-commits

Author: Igor Kudrin
Date: 2021-01-15T16:11:45+07:00
New Revision: 78036360573c35ea9e6a697d2eed92db893b4850

URL: 
https://github.com/llvm/llvm-project/commit/78036360573c35ea9e6a697d2eed92db893b4850
DIFF: 
https://github.com/llvm/llvm-project/commit/78036360573c35ea9e6a697d2eed92db893b4850.diff

LOG: [libcxx testing] Fix UB in tests for std::lock_guard

If mutex::try_lock() is called in a thread that already owns the mutex,
the behavior is undefined. The patch fixes the issue by creating another
thread, where the call is allowed.

Differential Revision: https://reviews.llvm.org/D94656

Added: 


Modified: 

libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/adopt_lock.pass.cpp

libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/mutex.pass.cpp

Removed: 




diff  --git 
a/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/adopt_lock.pass.cpp
 
b/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/adopt_lock.pass.cpp
index 5135dbcef816..db6a2e35f9c5 100644
--- 
a/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/adopt_lock.pass.cpp
+++ 
b/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/adopt_lock.pass.cpp
@@ -18,15 +18,21 @@
 #include 
 #include 
 
+#include "make_test_thread.h"
 #include "test_macros.h"
 
 std::mutex m;
 
+void do_try_lock() {
+  assert(m.try_lock() == false);
+}
+
 int main(int, char**) {
   {
 m.lock();
 std::lock_guard lg(m, std::adopt_lock);
-assert(m.try_lock() == false);
+std::thread t = support::make_test_thread(do_try_lock);
+t.join();
   }
 
   m.lock();

diff  --git 
a/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/mutex.pass.cpp
 
b/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/mutex.pass.cpp
index 0e096eabe4b6..5dcecd344c36 100644
--- 
a/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/mutex.pass.cpp
+++ 
b/libcxx/test/std/thread/thread.mutex/thread.lock/thread.lock.guard/mutex.pass.cpp
@@ -21,14 +21,20 @@
 #include 
 #include 
 
+#include "make_test_thread.h"
 #include "test_macros.h"
 
 std::mutex m;
 
+void do_try_lock() {
+  assert(m.try_lock() == false);
+}
+
 int main(int, char**) {
   {
 std::lock_guard lg(m);
-assert(m.try_lock() == false);
+std::thread t = support::make_test_thread(do_try_lock);
+t.join();
   }
 
   m.lock();



___
llvm-branch-commits mailing list
llvm-branch-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits