On Sat, May 12, 2018 at 2:16 AM, Daniel Frey <djqf...@gmail.com> wrote:
> Hi all,
>
> I am trying to do something relatively simple and I've had something
> working in the past, but my brain just doesn't want to work today.
>
> I have a text file with the following (this is just a subset of about
> 2500 dates, and I don't want to edit these all by hand if I can avoid it):
>
> --- START ---
> December 2, 1994
> March 27, 1992
> June 4, 1994
> 1993
> January 11, 1992
> January 3, 1995
>
>
> March 12, 1993
> July 12, 1991
> May 17, 1991
> August 7, 1992
> December 23, 1994
> March 27, 1992
> March 1995
> --- END ---
>
> As you can see, there's no standard in the way the date is formatted.
> Some of them are also formatted YYYY-MM-DD and MM-DD-YYYY.
>
> I have a basic grep that I tossed together:
>
> grep -o '\([0-9]\{4\}\)'
>
> This does extract the year but yields the following:
>
> 1994
> 1992
> 1994
> 1993
> 1992
> 1995
> 1993
> 1991
> 1991
> 1992
> 1994
> 1992
> 1995
>
> As you can see, the two empty lines are removed but this will cause
> problems with data not lining up later on.
>
> Does anyone have a quick tip for my tired brain to make this work and
> just output a blank line if there's no match? I swear I did this months
> ago and had something working but I apparently didn't bother saving the
> script I made. Argh!
>
> Dan
>

Here's an awk and sed scripts for you to try:
cat dates
December 2, 1994
March 27, 1992
June 4, 1994
1993
January 11, 1992
January 3, 1995


March 12, 1993
July 12, 1991
May 17, 1991
August 7, 1992
December 23, 1994
March 27, 1992
March 1995

2018-05-12
05-12-2018

awk 'match($0,/[0-9][0-9][0-9][0-9]/){
print substr($0, RSTART, RLENGTH)
}
/^$/
' dates

1994
1992
1994
1993
1992
1995


1993
1991
1991
1992
1994
1992
1995

2018
2018

sed 's/.*\([0-9][0-9][0-9][0-9]\).*/\1/p
/^$/p
d' dates

1994
1992
1994
1993
1992
1995


1993
1991
1991
1992
1994
1992
1995

2018
2018

Reply via email to