[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-10-13 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-13615:

Fix Version/s: (was: 6.0.0)
   7.0.0

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Priority: Major
>  Labels: good-first-issue, kernel
> Fix For: 7.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-10-20 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-13615:
---
Labels: good-first-issue kernel pull-request-available  (was: 
good-first-issue kernel)

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-10-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld updated ARROW-13615:
-
Description: 
There is more to this issue than meets the eye. The 
{{stringr::str_to_sentence()}} does 2 things:
 * capitalise the first word 
 * if there are multiple sentences provided as a single string, attempts to 
find sentence breaks and capitalise the first word of each sentence.

The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which in 
turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
period (".") as a sentence end and does not capitalise words following it. 
Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
kernel (which capitalises the first word of a string without making any attempt 
to break into sentences) and the behaviour of {{stringr::str_to_sentence()}}.

For more extensive discussions around the {{stringi / stringr}} implementation 
see {{stringr}} issues [202|https://github.com/tidyverse/stringr/issues/202] 
and [231|https://github.com/tidyverse/stringr/issues/231].

Due to the complexity of this issue and the relatively niche use cases, the 
recommendation is to postpone implementation until someone requests it.

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The 
> {{stringr::str_to_sentence()}} does 2 things:
>  * capitalise the first word 
>  * if there are multiple sentences provided as a single string, attempts to 
> find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which 
> in turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
> consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
> period (".") as a sentence end and does not capitalise words following it. 
> Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
> kernel (which capitalises the first word of a string without making any 
> attempt to break into sentences) and the behaviour of 
> {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}} 
> implementation see {{stringr}} issues 
> [202|https://github.com/tidyverse/stringr/issues/202] and 
> [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the 
> recommendation is to postpone implementation until someone requests it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-10-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld updated ARROW-13615:
-
Description: 
There is more to this issue than meets the eye. The 
{{stringr::str_to_sentence()}} does 2 things:
 * capitalise the first word 
 * if there are multiple sentences provided as a single string, attempts to 
find sentence breaks and capitalise the first word of each sentence.

The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which in 
turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
period (".") as a sentence end and does not capitalise words following it. 
Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
kernel (which capitalises the first word of a string without making any attempt 
to break into sentences) and the behaviour of {{stringr::str_to_sentence()}}.

For more extensive discussions around the {{stringi / stringr}} implementation 
see {{stringr}} issues [202|https://github.com/tidyverse/stringr/issues/202] 
and [231|https://github.com/tidyverse/stringr/issues/231].

Due to the complexity of this issue and the relatively niche use cases, the 
recommendation is to postpone implementation.

  was:
There is more to this issue than meets the eye. The 
{{stringr::str_to_sentence()}} does 2 things:
 * capitalise the first word 
 * if there are multiple sentences provided as a single string, attempts to 
find sentence breaks and capitalise the first word of each sentence.

The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which in 
turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
period (".") as a sentence end and does not capitalise words following it. 
Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
kernel (which capitalises the first word of a string without making any attempt 
to break into sentences) and the behaviour of {{stringr::str_to_sentence()}}.

For more extensive discussions around the {{stringi / stringr}} implementation 
see {{stringr}} issues [202|https://github.com/tidyverse/stringr/issues/202] 
and [231|https://github.com/tidyverse/stringr/issues/231].

Due to the complexity of this issue and the relatively niche use cases, the 
recommendation is to postpone implementation until someone requests it.


> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The 
> {{stringr::str_to_sentence()}} does 2 things:
>  * capitalise the first word 
>  * if there are multiple sentences provided as a single string, attempts to 
> find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which 
> in turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
> consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
> period (".") as a sentence end and does not capitalise words following it. 
> Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
> kernel (which capitalises the first word of a string without making any 
> attempt to break into sentences) and the behaviour of 
> {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}} 
> implementation see {{stringr}} issues 
> [202|https://github.com/tidyverse/stringr/issues/202] and 
> [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the 
> recommendation is to postpone implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-11-25 Thread Jira


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dragoș Moldovan-Grünfeld updated ARROW-13615:
-
Parent: ARROW-14865
Issue Type: Sub-task  (was: Improvement)

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: R
>Reporter: Nicola Crane
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>  Labels: good-first-issue, kernel, pull-request-available
> Fix For: 7.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> There is more to this issue than meets the eye. The 
> {{stringr::str_to_sentence()}} does 2 things:
>  * capitalise the first word 
>  * if there are multiple sentences provided as a single string, attempts to 
> find sentence breaks and capitalise the first word of each sentence.
> The {{stringr}} implementation wraps {{stringi::str_trans_totitle()}}, which 
> in turns uses ICU’s BreakIterator to locate specific text boundaries. As a 
> consequence {{stringr::str_to_title()}} is not able to identify a full stop / 
> period (".") as a sentence end and does not capitalise words following it. 
> Thus, there is a discrepancy between behaviour of the {{utf8_capitalize}} 
> kernel (which capitalises the first word of a string without making any 
> attempt to break into sentences) and the behaviour of 
> {{stringr::str_to_sentence()}}.
> For more extensive discussions around the {{stringi / stringr}} 
> implementation see {{stringr}} issues 
> [202|https://github.com/tidyverse/stringr/issues/202] and 
> [231|https://github.com/tidyverse/stringr/issues/231].
> Due to the complexity of this issue and the relatively niche use cases, the 
> recommendation is to postpone implementation.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-08-17 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-13615:

Labels: good-first-issue kernel  (was: )

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nic Crane
>Priority: Major
>  Labels: good-first-issue, kernel
> Fix For: 6.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-08-17 Thread Neal Richardson (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Neal Richardson updated ARROW-13615:

Fix Version/s: 6.0.0

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nic Crane
>Priority: Major
> Fix For: 6.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-13615) [R] Bindings for stringr::str_to_sentence

2021-10-01 Thread Alessandro Molina (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-13615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alessandro Molina updated ARROW-13615:
--
Fix Version/s: (was: 6.0.0)
   7.0.0

> [R] Bindings for stringr::str_to_sentence
> -
>
> Key: ARROW-13615
> URL: https://issues.apache.org/jira/browse/ARROW-13615
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Nicola Crane
>Priority: Major
>  Labels: good-first-issue, kernel
> Fix For: 7.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)