Re: How to get the current time in time zone represented by strings like +0100?
> Yes. I think you will find this is all described in the manual at > https://www.gnu.org/software/coreutils/manual/html_node/Specifying-time-zone-rules.html """ ‘TZ="<+0530>-5:30"’ says that the time zone abbreviation is ‘+0530’ and the time zone is 5 hours 30 minutes east of Greenwich. """ The negative means east in "-5:30". But why is time zone abbreviation "<+0530>" positive? This is confusing. Why not make them consistent? -- Regards, Peng
Re: How to get the current time in time zone represented by strings like +0100?
My understanding in the last email is wrong. So +02:00 in your example is actually -0200 in my example, can date take the meaning "+" as in my original example? Or I will have to flip the signs myself? $ TZ=Europe/Paris date Thu May 16 15:27:48 CEST 2024 $ TZ='XXX+02:00' date Thu May 16 11:28:06 XXX 2024 On Thu, May 16, 2024 at 8:24 AM Peng Yu wrote: > > On Wed, May 15, 2024 at 12:04 AM Grisha Levit wrote: > > > > On Tue, May 14 2024 at 16:05 Peng Yu wrote: > > > For example, in the time zone represented by +0100, how to get its > > > current time from date using '+0100' as input? Thanks. > > > > Use the offset to create a timezone specification, supplied in the TZ > > environment variable. > > > > TZ='XXX-01:00' date > > Strings like +0100 is relative to UTC. For example, +0100 is Central > European Time. I guess that you understood +0100 as relative to my > current timezone. > > How to achieve +0100 as relative to UTC with date? > > > The `XXX` is an arbitrary (required) name. Note that the sign of the > > offset has the opposite of its usual meaning. The full format can be > > found in the tzset(3) man page. > > -- > Regards, > Peng -- Regards, Peng
Re: How to get the current time in time zone represented by strings like +0100?
On Wed, May 15, 2024 at 12:04 AM Grisha Levit wrote: > > On Tue, May 14 2024 at 16:05 Peng Yu wrote: > > For example, in the time zone represented by +0100, how to get its > > current time from date using '+0100' as input? Thanks. > > Use the offset to create a timezone specification, supplied in the TZ > environment variable. > > TZ='XXX-01:00' date Strings like +0100 is relative to UTC. For example, +0100 is Central European Time. I guess that you understood +0100 as relative to my current timezone. How to achieve +0100 as relative to UTC with date? > The `XXX` is an arbitrary (required) name. Note that the sign of the > offset has the opposite of its usual meaning. The full format can be > found in the tzset(3) man page. -- Regards, Peng
How to get the current time in time zone represented by strings like +0100?
Hi, For example, in the time zone represented by +0100, how to get its current time from date using '+0100' as input? Thanks. -- Regards, Peng
Re: mkdir -p competition on the same directory?
OK. I see the following output of `sudo dtruss mkdir -p d`. So essentially, coreutils first calls system function mkdir to make the directory. On error of the system call, it will check the target is a directory. If the target is indeed a directory, then no error message will be printed. Do I understand it correctly? ... mkdir("d\0", 0x1FF, 0x0) = -1 Err#17 stat64("d\0", 0x7FFEE9953D20, 0x0) = 0 0 ... Therefore, when there is competition among many calls to coreutils `mkdir -p`. The first instance will create the target, and the rest instances will fail on the system call of mkdir. But since they find the target is already created and is a directory, they will not complain about the error system call mkdir. That is why I never see an error similar to that of bash loadable `mkdir -p`. Is it so? On 2/9/23, Pádraig Brady wrote: > On 09/02/2023 14:57, Peng Yu wrote: >> https://lists.gnu.org/archive/html/help-bash/2023-02/msg00053.html >> >> Bash loadable `mkdir -p` has a problem when multiple loadable `mkdir >> -p` is called on the same directory simultaneously. >> >> But I never see coreutils' `mkdir -p` has the same problem. Does >> coreutils' `mkdir -p` do something extra to guard against the >> competition on the same directory? > > `mkdir d; strace mkdir -p d` would be instructive, > but yes coreutils mkdir essentially does: > >if mkdir(d) == EEXIST > return stat(d) == S_ISDIR > > cheers, > Pádraig > > -- Regards, Peng
mkdir -p competition on the same directory?
https://lists.gnu.org/archive/html/help-bash/2023-02/msg00053.html Bash loadable `mkdir -p` has a problem when multiple loadable `mkdir -p` is called on the same directory simultaneously. But I never see coreutils' `mkdir -p` has the same problem. Does coreutils' `mkdir -p` do something extra to guard against the competition on the same directory? -- Regards, Peng
How to make mv -i return non zero status when uses choose n?
Hi, When I use mv -i and choose n so that the destination will not be overwritten, the return status is still zero. Is there a way to let mv return nonzero status to reflect that n is chosen by the user? Thanks. -- Regards, Peng
How to count the last line when it does not end with a newline character?
I got 1 instead of 2 in the following example. How to count the last even when it does not end with a newline character? Thanks. $ printf 'a\nb'|wc -l 1 -- Regards, Peng
Re: how to speed up sort for partially sorted input?
On Wed, Aug 11, 2021 at 1:43 PM Kaz Kylheku (Coreutils) <962-396-1...@kylheku.com> wrote: > > On 2021-08-11 05:03, Peng Yu wrote: > > On Wed, Aug 11, 2021 at 5:29 AM Carl Edquist > > wrote: > >> (With just a bit more work, you can do all your sorting in a single > >> awk > >> process too (without piping out to sort), but i think you'll still be > >> disappointed with the performance compared to a single sort command.) > > > > Yes, this involves many calls of the coreuils' sort, which is not > > No, not this last remark, which is about "in a single awk process". I know there is one awk process. I don't understand why you mentioned it. > > efficient. Would it make sense to add an option in sort so that sort > > can sort a partially sorted input in one shot. > > IF you're willing to use GNU Coreutils instead of Unix, you probably > have I don't think using awk is efficient. I am program a number awk programs for simple transforming the input and tested it, in general, it is slower than the equivalent python code, let along C code. You can talk about doing most of the work in awk below. I don't think that make sense. Having coreutils' sort be able to do a partial sort is a more reasonable solution. > GNU Awk also. GNU Awk has a sorting function using which a solution > could be cobbed together. Maybe something like: > > > function dump_delete_data() > { > n = asorti(data, idx); > for (i = 1; i <= n; i++) > print data[idx[i]]; > delete data > serial = 0 > } > > BEGIN { serial = 0 } > $1 != prev_1{ dump_delete_data() } > NF >= 2 { prev_1 = $1 >data[$2 "." serial++] = $0 >next } > 1 { dump_delete_data() >print } > END { dump_delete_data() } > > > The asorti function has some features behind it to sort in various ways; > you have to look into that. It involves manipulating a > PROCINFO["sorted_in"] > value. > > It's possible to use a custom comparison function. > > For more info, see GNU Awk documentation, the Gawk mailing list or > the comp.lang.awk newsgroup. > > The purpose of the serial variable in my above code so that we get > two entries in data[] if in a given group, there are identical $2 > values. > > For instance if $2 is "foo", then the key we use is actually "foo.3" if > the current value of serial is 3. The sorting is then done on these > suffixed keys, which works okay for lexicographic sorting. > > It is not a stable sort, though! Because foo.123 will be sorted before > foo.23, even though the 123 serial value comes later. If we padded the > integer with enough leading zeros for the largest possible group, it > would then be stable: foo.00023 would come before foo.00123: > > data[sprintf("%s.%08X", $2, serial++)] = $0 > > kind of thing. > > If you don't care about reproducing duplicates, you can remove this > logic > entirely. > > How the overall program works is that data[] is an array indexed on the > second column values (plus serial suffixes). The value of each index > value is the entire record, $0. > > asorti sorts the $2 indices, throwing away the $0 values, which > is why we direct it into a secondary array called idx, preserving > the data array. The idx array ends up indexed on integer values 1 to N, > where N is the chunk size. If we iterate over these values, idx[i] > gives us the $2 column values (with serial suffix) in sorted order. > We can then use that as the key into data[] to get the corresponding > records in sorted order. > > Cheers ... > -- Regards, Peng
Re: how to speed up sort for partially sorted input?
On Wed, Aug 11, 2021 at 5:29 AM Carl Edquist wrote: > > On Tue, 10 Aug 2021, Kaz Kylheku (Coreutils) wrote: > > > On 2021-08-07 17:46, Peng Yu wrote: > >> Hi, > >> > >> Suppose that I want to sort an input by column 1 and column 2 (column > >> 1 is of a higher priority than column 2). The input is already sorted > >> by column1. > >> > >> Is there a way to speed up the sort (compared with not knowing column > >> 1 is already sorted)? Thanks. > > > > Since you know that colum 1 is sorted, it means that a sequential scan > > of the data will reveal chunks that have the same colum1 value. > > > > You just have to read and separate these chunks, and sort each one > > individually by column 2. > > Neat observation. > > You could do that tersely in awk by piping each chunk to a separate sort > process, like: > > awk ' > c1 != $1 { close(sort); c1 = $1 } > { print | sort } > ' sort="sort -k2,2" partially-sorted-input.txt > > In theory, that would bring the sorting work down from ~ O(n * log(n)) to > ~ O(n * log(n/m)) (for a partially-sorted file with n lines and m > column-1 chunks of equal size). > > But the overhead of starting a new sort process for each chunk is likely > going to outweigh that advantage. In the end, just sorting the whole file > at once (despite column 1 already being sorted) is still likely to be > faster. > > (With just a bit more work, you can do all your sorting in a single awk > process too (without piping out to sort), but i think you'll still be > disappointed with the performance compared to a single sort command.) Yes, this involves many calls of the coreuils' sort, which is not efficient. Would it make sense to add an option in sort so that sort can sort a partially sorted input in one shot. -- Regards, Peng
how to speed up sort for partially sorted input?
Hi, Suppose that I want to sort an input by column 1 and column 2 (column 1 is of a higher priority than column 2). The input is already sorted by column1. Is there a way to speed up the sort (compared with not knowing column 1 is already sorted)? Thanks. -- Regards, Peng
How to get the past Mon/Tue/.. not after the current date?
Hi, $ date -d 'last Tue' +%Y-%m-%d 2021-07-13 $ date +%Y-%m-%d 2021-07-20 $ date -d 'last Mon' +%Y-%m-%d 2021-07-19 I want to get the last day of week not in the future. In the above example, I want to get this Tue (2021-07-20) instead the last Tue (2021-07-13). But for Mon, I want to get 2021-07-19. Is there a date string to get such output? -- Regards, Peng
od and UTF-8
Hi, I am wondering whether there is a way to do something similar to od, but respect UTF-8 characters. For example, instead of print this, $ od -c -t x1 -Ax <<< α 00 � � \n ce b1 0a 03 I want to print this. Basically, if it is a printable UTF that does not require escape, just print it as is. $ od -c -t x1 -Ax <<< α 00 α ce b1 0a 03 But if it needs some transform to be seen clearly, it should be done. For example, a space character should be printed as ' ' (with single quote printed). Other space characters in UTF-8 should also be printed with single quotes enclosing the space. Is there a tool to do so? -- Regards, Peng
make ls -l dir1 dir2 in the same order as dir1,2 are specified
Hi, When I try `ls -l dir1 dir2`, the order of dir1 and dir2 in the output is not necessarily the same as the input. How to make it the same as the input order? Thanks. -- Regards, Peng
Re: differece between mkfifo and mknod ... p
On Wed, Mar 24, 2021 at 4:52 PM Bob Proulx wrote: > > Peng Yu wrote: > > It seems that both `mkfifo` and `mknod ... p` can create a fifo. What > > is the difference between them? Thanks. > > The mknod utility existed "for a decade" in Unix (don't quote me on > that vague time statement) before mkfifo existed. The mknod utility > existed in Unix v7 as a thin wrapper around the mknod(2) system call. > > man 2 mknod > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html > > Named pipes are special files and special files are created with > mknod. At least that was true until mkfifo came along. mkfifo was > standardized by POSIX while the mknod utility seems too OS specific > and never made it into the standards as far as I know. > > Therefore "mkfifo" should be used for standards compliance and "mknod" > should continue to exist for backwards compatibility. In that case, should a warning message be printed to persuade people not to use it? Otherwise, people will continue to use it. By discouraging people from using it for a long period (say 10 years), its support can be dropped eventually which will reduce future maintenance costs of this duplicate code. -- Regards, Peng
Print modification time in compact form
Hi, I see modification time can be printed in this format. $ stat -c '%y' file.txt 2017-07-31 17:50:54.0 +0100 Is there a way to directly print it as 20170731-1750? Thanks. -- Regards, Peng
Re: differece between mkfifo and mknod ... p
Thanks. Why is there such a redundancy? Is it for backward compatibility? If not for backward compatibility, I’d think mknod ... p should be removed, for this syntax is worse than that of mkfifo. On Sat, Mar 13, 2021 at 7:48 AM Steeve McCauley wrote: > Ah, sorry, yeah more or less identical when it comes to making the fifo, > > $ strace mknod mknod p 2>&1 | grep -i fifo > mknod("mknod", S_IFIFO|0666)= 0 > $ strace mkfifo mkfifo 2>&1 | grep -i fifo > execve("/usr/bin/mkfifo", ["mkfifo", "mkfifo"], 0x7ffe46acfb88 /* 69 vars > */) = 0 > mknod("mkfifo", S_IFIFO|0666) = 0 > > $ ls -l mk* > prw-r--r-- 1 steeve steeve 0 Mar 13 08:45 mkfifo > prw-r--r-- 1 steeve steeve 0 Mar 13 08:45 mknod > > > > On Sat, Mar 13, 2021 at 8:38 AM Peng Yu wrote: > >> But my question is the p of mknod and mkfifo. Are they the same or >> different? >> >> On Sat, Mar 13, 2021 at 5:21 AM Steeve McCauley < >> steeve.mccau...@gmail.com> wrote: >> >>> mknod can make character (c) or block (b) or pipe (p) device files (as >>> found under /dev). >>> >>> mkfifo makes "named pipes" so they can behave like files. >>> >>> https://en.wikipedia.org/wiki/Named_pipe >>> >>> On Sat, Mar 13, 2021 at 12:44 AM Peng Yu wrote: >>> >>>> Hi, >>>> >>>> It seems that both `mkfifo` and `mknod ... p` can create a fifo. What >>>> is the difference between them? Thanks. >>>> >>>> -- >>>> Regards, >>>> Peng >>>> >>>> >>> >>> -- >>> :wq >>> >> -- >> Regards, >> Peng >> > > > -- > :wq > -- Regards, Peng
differece between mkfifo and mknod ... p
Hi, It seems that both `mkfifo` and `mknod ... p` can create a fifo. What is the difference between them? Thanks. -- Regards, Peng
How to ensure UTF-8 sort?
Hi, I want to make sure sort is always use UTF-8. But I am not sure what locale is universally available on all OSes. Does anybody know what is the correct way to make sure sort by UTF-8 in all machines that coreutils is installed? Thanks. -- Regards, Peng
Better support of timezone abbreviation in `date`?
Hi, It looks like some time zone abbreviations are not supported by `date`. For example, THA is not supported. Can a more comprehensive support be added? Thanks. https://www.timeanddate.com/time/zone/thailand -- Regards, Peng
What timezone strings are supported by `date`?
Hi, It seems that time zone string like CET, PST are supported by `date`. But I don't find a complete list of such strings supported by `date`. Is there a doc that describe all of them? Thanks. -- Regards, Peng
What is the interpretation of bs of dd in terms of predicting the disk performance of other I/O bound programs?
Hi, Many people use dd to test disk performance. There is a key option dd, which I understand what it literally means. But it is not clear how there performance measured by dd using a specific bs maps to the disk performance of other I/O bound programs. Could you anybody let me know the interpretation of bs in terms of predicting the performance of other I/O bound programs? Thanks. -- Regards, Peng
Re: How tail works on a large file?
Do you mean that I need to run `hexedit the_large_file`. What is the purpose of this? I don't quite understand. On 8/22/20, Budi wrote: > use wxHexEditor or Curses Hexedit > hit End to bring us to the tail > > On 8/22/20, Peng Yu wrote: >> Hi, >> >> I tried to tail a large file (2.8GB) to get is last 10 lines. It runs >> very >> fast. >> >> How is this achieved? Does tail do it differently between a file >> (random disk access) and a pipe (sequential disk access)? Thanks. >> >> -- >> Regards, >> Peng >> >> > -- Regards, Peng
How tail works on a large file?
Hi, I tried to tail a large file (2.8GB) to get is last 10 lines. It runs very fast. How is this achieved? Does tail do it differently between a file (random disk access) and a pipe (sequential disk access)? Thanks. -- Regards, Peng
../.. resolution of ls
Hi, It seems that ../../ can not be resolved symbolically by ls. See the following example. I'd like `ls ..` to print both a and b. Unfortunately, it only print b because it thinks it is in /tmp/i/a/b instead of /tmp/i/b. Is there a way to use symbolic pwd instead of abs pwd? Thanks. /tmp/i$ tree . ├── a │ └── b └── b -> a/b/ /tmp/i$ cd b /tmp/i/b$ ls -H ../ b /tmp/i/b$ ls ../ b -- Regards, Peng
Re: Does -s apply to -m in sort?
Are you the author of -m? If not, maybe the author of -m should knows how it works with -s? If not, maybe this should be documented anyway? On Mon, May 11, 2020 at 5:01 PM Eric Blake wrote: > On 5/11/20 4:18 PM, Peng Yu wrote: > > I used real files (already sorted) to test whether having -s or not > > affect -m. But I have not made minimal example input files so that is > > why I am not sure about my conclusion. > > > > But the command to try is basically `sort -m -k sort_fields files...` > > or `sort -s -m -k sort_fields files..`. > > That's closer - it shows a pseudo-command line you attempted. But it > still does not lend itself to reproducibility, because we don't know > what 'sort_fields' you used, nor what 'files..' contain. > > You also didn't state whether you tried the --debug option, to see if > the presence or absence of -s showed enough debugging crumbs to prove > that you at least tried to analyze the problem yourself. Nor did you > mention whether you read the source code (it _is_ open source, after > all, so instead of asking someone else to do your homework, _you_ can > find the answer). > > > > > I assume to authors who made -m and -s. My question should be clear? > > Unfortunately, your assumption is wrong. A clear question is one that > includes actual examples, and not one that forces someone to reproduce > the work that you could have already provided them. Put it this way: > suppose it took you 5 minutes to come up with a test case, and that > there are 100 list readers interested in your problem, each of whom then > take another 5 minutes to reproduce the setup from your vague > description. Then you have cost 505 minutes of collective time; and > your original work plus the work of each reader results in a very low > signal-to-noise ratio (5/505 is less than 1% new discoveries, and more > than 99% rehash). But if it takes you an additional 5 minutes to polish > your query into an email that can then be copy-pasted into a terminal so > that each reader can reproduce the problem in 5 seconds, then your > initial 10 minutes of effort (which is indeed twice the work on your > part) plus 500 seconds of list readers' time results in a much better > ratio of useful new work (5/18.3 is > 27%). Although it costs you more, > your efforts to make everyone else's life easier is magnified by the > number of readers benefitted by your extra efforts. (And that's why I'm > spending so long in writing my reply - to try to teach you that your > historical style of questioning leaves a lot to be desired, as well as a > possibly-futile attempt on my part to get you to recognize that the more > effort YOU put into a good bug report, the less likely you are to be > habitually ignored as someone who merely wastes time). > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > > > -- Regards, Peng
Re: Does -s apply to -m in sort?
I used real files (already sorted) to test whether having -s or not affect -m. But I have not made minimal example input files so that is why I am not sure about my conclusion. But the command to try is basically `sort -m -k sort_fields files...` or `sort -s -m -k sort_fields files..`. I assume to authors who made -m and -s. My question should be clear? On 5/11/20, Eric Blake wrote: > On 5/9/20 4:31 PM, Peng Yu wrote: >> It seems that -s of sort is not useful when -m is used based on my >> simple test case. But I am not completely sure. Could anybody let me >> know if this is the case? Thanks. > > Without seeing your simple test case, I cannot presume to know what you > tried or failed to try in making your determination, and lack the > information necessary to repeat the experiment myself. Help us help you > - when asking a question, give us enough relevant details, including the > contents of the files and the command line you attempted your experiment > with. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > > -- Regards, Peng
Does -s apply to -m in sort?
It seems that -s of sort is not useful when -m is used based on my simple test case. But I am not completely sure. Could anybody let me know if this is the case? Thanks. -- Regards, Peng
How to ls a directory with the directory path prepended?
When I `ls` a directory, the content will be shown without the directory path. Is there an option of `ls` to prepend the directory path? Note that I am not looking for this way, as it involves shell. ls d/* Thanks. -- Regards, Peng
Which sha sum is the fastest?
I got the following run time on a file of 116M. They are ranked in this order. Is this runtime order in general true? sha1sum < sha384sum <~ sha512sum < sha256sum <~ sha224sum ==> sha1sum <== real0m0.330s user0m0.275s sys 0m0.042s ==> sha224sum <== real0m0.679s user0m0.640s sys 0m0.029s ==> sha256sum <== real0m0.668s user0m0.633s sys 0m0.027s ==> sha384sum <== real0m0.380s user0m0.350s sys 0m0.027s ==> sha512sum <== real0m0.388s user0m0.354s sys 0m0.027s -- Regards, Peng
altchars for base64
Hi, Python base64 decoder has the altchars option. https://docs.python.org/3/library/base64.html base64.b64decode(s, altchars=None, validate=False)¶ But I don't see such an option in coreutils' base64. Can this option be added? Thanks. -- Regards, Peng
sort by hex number?
I have a TSV file with a column in hex format, e.g., 0x1a000, 0x17000, 0xe000. Is there a way to sort the rows by this column in hex? Thanks. -- Regards, Peng
Re: What is the difference between unlink and rm -f?
So a one-line summary is When the target can be delete, unlink and rm -f are the same; otherwise, unlink will complain about the error and exit with 1, but rm -f will do neither. On 1/29/20, Kaz Kylheku (Coreutils) <962-396-1...@kylheku.com> wrote: > On 2020-01-29 01:45, Peng Yu wrote: >> Hi, >> >> It seems to me unlink and rm -f are the same if the goal is the delete >> files. When are they different? Thanks. > > I answered this on Unix Stackexchange in 2016: > > https://unix.stackexchange.com/a/326711/16369 > > :) > -- Regards, Peng
Re: Show directory time as the latest time of the file in the directory (including subdirs)
No. -t just shows the time of the directory itself. I want a summary time which is the latest time of all the contents (including the ones in the subdirecties, subsubdirs,...) in the directory. On 1/29/20, Bernhard Voelker wrote: > On 1/29/20 10:58 AM, Peng Yu wrote: >> Hi, >> >> For directories, ls shows in the time of the directory itself. >> Sometimes, it is more important to show the latest time of files in >> the directory in addition to the directory time. >> >> Is there an easy way to show such information? Thanks. > > I'm afraid I don't understand fully what you want to achieve. > Please give a small example. > Do you mean the -t option of 'ls'? > > Have a nice day, > Berny > -- Regards, Peng
Show directory time as the latest time of the file in the directory (including subdirs)
Hi, For directories, ls shows in the time of the directory itself. Sometimes, it is more important to show the latest time of files in the directory in addition to the directory time. Is there an easy way to show such information? Thanks. -- Regards, Peng
What is the difference between unlink and rm -f?
Hi, It seems to me unlink and rm -f are the same if the goal is the delete files. When are they different? Thanks. -- Regards, Peng
how to use touch to change change time?
Hi, I don't see how to change change time by touch. Is it possible with touch? Thanks. --time=WORD change the specified time: WORD is access, atime, or use: equiv- alent to -a WORD is modify or mtime: equivalent to -m -- Regards, Peng
Re: How to implement the V comparsion used by sort in python?
Are you sure they are 100% compatible with V? I don’t want to use them just later find they are not 100% compatible. On Sat, Oct 26, 2019 at 4:24 PM Assaf Gordon wrote: > Hello, > > > On Oct 25, 2019, at 8:00 PM, Peng Yu wrote: > > > I'd like to mimic the V sort order in python. Is there any easy to use > comparison available in python? > > > A simple online search will show several python packages that can do it. > For example: > > https://deb-pkg-tools.readthedocs.io/en/latest/api.html#module-deb_pkg_tools.version > > -assaf > -- Regards, Peng
How to implement the V comparsion used by sort in python?
Hi, I'd like to mimic the V sort order in python. Is there any easy to use comparison available in python? The following implementation is simple but it is not exactly the same as the sort order of V used in sort. Thanks. https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/ -- Regards, Peng
Can natural sort support be added?
Hi, Since natural sort is provided in a few languages (as mentioned in the Wikipedia page). Can it be supported by `sort` besides just version-sort? https://en.wikipedia.org/wiki/Natural_sort_order -- Regards, Peng
Re: Is natural sort supported?
> At the risk of arguing over semantics, > I'll say again: there is no "one correct" natural order standard, > and therefore it is not "plain and simple" because there is no just > "one" such order. I don't think there is no commonly accepted "natural sort". For example, I found another one that uses the same order as the python one that I showed above. The so-called version sort in corutils' sort is just not natural sort and it should not be called natural sort. $ printf '%s\n' G . | csvtk sort -k 1:N G . $ printf '%s\n' 1G 1. | csvtk sort -k 1:N 1G 1. $ printf '%s\n' 1G13 1.02 | csvtk sort -k 1:N 1G13 1.02 Wikipedia also explains what natural sort is and provided a few implementation links. I don't think any of them implemented the version sort as the natural sort. https://en.wikipedia.org/wiki/Natural_sort_order > and note that even the above blog writes: > "... Don't let Ned's clever Python ten-liner fool you. Implementing a > natural sort is more complex than it seems ... ". I don't understand this sentence. There is an implementation with just a few lines in python. Unless this implementation is wrong, then there is a simple implementation at least in python. -- Regards, Peng
Re: Is natural sort supported?
Some part of the manual is also poorly written. "1.1.2 Origin of version sort and differences from natural sort" After reading the above section, I am still not clear what is the difference. It is better to show some examples to illustrate the difference. On 10/8/19, Peng Yu wrote: > Then, the option name causes misunderstand. -V is actually > --debian-version. And it is not natural order (there is no such thing > like extension handling with natural order). The natural order is > plain and simple, just as what is explained below, which can be > implemented by a few lines of python code. > > https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/ > > So my question is whether natural order as in the above URL is supported? > > On 10/8/19, Assaf Gordon wrote: >> Hello, >> >> On 2019-10-08 12:36 a.m., Peng Yu wrote: >>> The following example shows that version sort is not natural sort. Is >>> natural sort supported in by `sort`? >> >> There is no such thing as "THE correct natural sort" order... >> >>> $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order >>> should have been reversed. >> >> ... therefore "should have" is simply incorrect expectation. >> >> You might think it "should" be one way, and other implementations >> think it "should" be another way. >> >> For more details, please see the attached HTML file for details. >> >> (this HTML file is a new chapter of the coreutils manual that will be >> included in the next release. The source texinfo is here: >> https://git.savannah.gnu.org/cgit/coreutils.git/tree/doc/sort-version.texi >> ). >> >> regards, >> - assaf >> >> > > > -- > Regards, > Peng > -- Regards, Peng
Re: Is natural sort supported?
Then, the option name causes misunderstand. -V is actually --debian-version. And it is not natural order (there is no such thing like extension handling with natural order). The natural order is plain and simple, just as what is explained below, which can be implemented by a few lines of python code. https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/ So my question is whether natural order as in the above URL is supported? On 10/8/19, Assaf Gordon wrote: > Hello, > > On 2019-10-08 12:36 a.m., Peng Yu wrote: >> The following example shows that version sort is not natural sort. Is >> natural sort supported in by `sort`? > > There is no such thing as "THE correct natural sort" order... > >> $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order >> should have been reversed. > > ... therefore "should have" is simply incorrect expectation. > > You might think it "should" be one way, and other implementations > think it "should" be another way. > > For more details, please see the attached HTML file for details. > > (this HTML file is a new chapter of the coreutils manual that will be > included in the next release. The source texinfo is here: > https://git.savannah.gnu.org/cgit/coreutils.git/tree/doc/sort-version.texi > ). > > regards, > - assaf > > -- Regards, Peng
Is natural sort supported?
Hi, The following example shows that version sort is not natural sort. Is natural sort supported in by `sort`? $ printf '%s\n' G . | LC_ALL=C sort -k 1,1V . G $ printf '%s\n' 1G 1. | LC_ALL=C sort -k 1,1V 1. 1G $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order should have been reversed. 1G13 1.02 -- Regards, Peng
Re: How to sort unicode properly?
If python can have pyuca that works across platform, why such thing can not have at C level? On Wed, Sep 25, 2019 at 12:24 PM Eric Blake wrote: > On 9/25/19 10:56 AM, Peng Yu wrote: > > I want to make my `sort` to be machine-independent and always use the > > correct Unicode sort order. Is there a way to do so? > > Those two goals are somewhat at odds. The only truly portable > machine-independent sorting is the one guaranteed by POSIX when you use > LC_ALL=C (fun fact: even on an EBCDIC machine, that is required by POSIX > to collate in ASCII order, rather than native byte order). The moment > you use any other locale, then you not only left to the mercies of > whoever wrote that locale, but also stuck with the fact that there is no > portable way to transfer locale definitions from one vendor's libc to > another. > > > > > I don't know how to check where en_US.UTF-8 comes from. Do you know > > how to check it? (I use Mac OS X.) > > All other locales are somewhat vendor-dependent; as you've discovered, > your vendor (Apple) has a rather gaping hole in their locale support. > But because Apple is a closed-source shop, it will have to be Apple that > fixes their bug, unless you want to take on the gargantuan task of > writing a gnulib module that provides locale tables to mirror glibc for > use on non-glibc machines. > > Note that glibc doesn't have that problem, at least on my system: > > $ cat /etc/fedora-release > Fedora release 30 (Thirty) > $ rpm -q glibc > glibc-2.29-22.fc30.x86_64 > $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort --debug > sort: text ordering performed using ‘en_US.UTF-8’ sorting rules > cafe > > café > > caff > > > So one option you could pursue is switching to an operating system that > does not curtail your freedoms. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > -- Regards, Peng
Re: How to sort unicode properly?
I want to make my `sort` to be machine-independent and always use the correct Unicode sort order. Is there a way to do so? I don't know how to check where en_US.UTF-8 comes from. Do you know how to check it? (I use Mac OS X.) On 9/25/19, Eric Blake wrote: > On 9/25/19 10:20 AM, Peng Yu wrote: >> Hi, >> >> It seems that "café" should be sorted before "caff" in Unicode. >> >> https://github.com/jtauber/pyuca >> >> But `sort` does not do so. >> >> $ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort >> cafe >> caff >> café >> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort >> cafe >> caff >> café >> >> How to make `sort` sort according to Unicode order? Thanks. > > You'll have to write a locale definition where strcoll() sorts in the > order you want. Coreutils sort is calling strcoll(), and if it doesn't > sort the way you think it should, the bug is in your locale and not in > coreutils. You'll want to report this issue to whoever provided your > en_US.UTF-8 locale (perhaps glibc?) > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > -- Regards, Peng
How to sort unicode properly?
Hi, It seems that "café" should be sorted before "caff" in Unicode. https://github.com/jtauber/pyuca But `sort` does not do so. $ printf '%s\n' cafe caff café | LC_ALL=UTF8 sort cafe caff café $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8 sort cafe caff café How to make `sort` sort according to Unicode order? Thanks. -- Regards, Peng
Can -f of seq take an integer format?
Hi, I only find %.0f to print integers. But it is just a float with no digits after the point. Is there a real integer format in seq? Thanks. $ seq -f '%.0f minutes' 2563199 2563200 2563199 minutes 2563200 minutes $ seq -f '%g minutes' 2563199 2563200 2.5632e+06 minutes 2.5632e+06 minutes 2.5632e+06 minutes $ seq -f '%d minutes' 2563199 2563200 seq: format ‘%d minutes’ has unknown %d directive -- Regards, Peng
How to convert a md5sum back to a timestamp?
Hi, Suppose that I know a md5sum that is derived one of the timestamps computed below. Is there a way to quickly derive what the original timestamp is? I could make a database of all the timestamps and their md5sums. But as the total number of entries increases, this solution will not be scalable as the database can be big. Is it there any better solution to this problem? for i in {1..2563200}; do date -d "-$i minutes" +%Y%m%d_%I%M%p; done -- Regards, Peng
How to list not only content in a diretory but the directory itself as well?
Hi `ls somedir` without -d will show the content of a directory. With -d, it will show the info of the directory itself. Is there a way to show both in a single command? Thanks. -- Regards, Peng
Re: Is there a way to gzip the temp file used by `sort`?
Thanks. Does this option affect the -m option? Thanks. On 7/1/19, Ed wrote: > On 2019-07-01 10:44-0500, Peng Yu wrote: >> Hi, >> >> The temp files used by `sort` are not gzipped. Is there a way to use >> gzip to save the space used by the temp files? Thanks. > > Did you try --compress-program=gzip? > > -- > Best regards, > Ed http://www.s5h.net/ > > -- Regards, Peng
Is there a way to gzip the temp file used by `sort`?
Hi, The temp files used by `sort` are not gzipped. Is there a way to use gzip to save the space used by the temp files? Thanks. -- Regards, Peng
How to print sizes of both files and directories in a directory?
Hi, `du -h --max-depth=1` only print directory sizes. Is there a way to print the sizes of both directories and files in a directory? Thanks. -- Regards, Peng
Re: How to sort and count efficiently?
The problem with this kind of awk program is that everything will be loaded to memory. But bare `sort` use external files to save memory. When the hash in awk is too large, accessing it can become very slow (maybe due to potential cache miss or slow down of hash as a function of hash size). On Sun, Jun 30, 2019 at 11:52 AM Assaf Gordon wrote: > Correcting myself: > > On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote: > > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote: > > > > > > I have a long list of string (each string is in a line). I need to > > > count the number of appearance for each string. > > > > > > [...] Does anybody know any better way > > > to make the sort and count run more efficiently? > > > > > > > Or using gnu awk: > > use 'asorti' instead of 'asort', with the two-parameter variant: > > > $ printf "%s\n" a c b b b b b b c \ > | awk 'a[$1]++ {} >END { n = asorti(a,b) > for (i = 1; i <= n; i++) { > print b[i], a[b[i]] > } >}' > a 1 > b 6 > c 2 > > > For more details see: > > https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html#Array-Sorting-Functions > > -assaf > > -- Regards, Peng
How to sort and count efficiently?
Hi, I have a long list of string (each string is in a line). I need to count the number of appearance for each string. I currently use `sort` to sort the list and then use another program to do the count. The second program doing the count needs only a small amount of the memory as the input is sorted. But `sort` writes a lot of temp files like `sortjISjDY`, which are very large. Because I only need the count, ideally, I'd like these temp files only keep the count info and the original string once, but not the original string many times. Does anybody know any better way to make the sort and count run more efficiently? -- Regards, Peng
Does --parallel apply to merge sort?
Hi, It seems that there is no need to use parallelization for merge sort. So for the following option of `sort`, I think that it only applies to regular sort by not merge sort. Is it so? --parallel=N change the number of sorts run concurrently to N -- Regards, Peng
Re: How to calculate date relative to another date?
> Seems to work fine when date specification is not quite as ambiguous > as "2018/05". > > $ date --iso --date='2018-05-01 5 years ago' > 2013-05-01 What is special about --iso? If I use the following date string, I get a future time. Why? $ date --date='2018-05-01 4 years 11 months ago' +%Y%m 202106 -- Regards, Peng
How to calculate date relative to another date?
Hi, For example, I want to calculate 5 years less a month from May 2018, i.e., "2018/05", the result should be "2013/06". https://www.gnu.org/software/coreutils/manual/html_node/Examples-of-date.html I don't think the direct calculation of this kind of relative date is possible with coreutiles' date command. Some kind of external arithmetic calculation must be used. Is it so? -- Regards, Peng
Re: Why TAB in ansi color is not recognized?
Thanks. Where the `[ K` come from? I only see `[ m` but not `[ K`. What does `[ K` mean? Thanks. http://pueblo.sourceforge.net/doc/manual/ansi_color_codes.html On Sun, Apr 28, 2019 at 2:49 PM Assaf Gordon wrote: > > Hello, > > On 2019-04-28 11:23 a.m., Peng Yu wrote: > > > > In the 2nd example, it is not sorted as what I want. Why is it so? > > > > $ printf '%s\t%s\n' a 1 a 2 |grep --color=always a | sort -k 2,2nr > > a 2 > > a 1 > > $ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | sort -k 2,2nr > > a 1 > > a 2 > > > > The 'grep' in the second example highlights *both* the 'a' character > and the 'tab' character. > > This means that the ANSI sequence to restore color (\033 [ m \033 [ K) > appears *after* the tab, and is then parsed by 'sort' as the beginning > of the second field: > > $ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | od -tc -An > 033 [ 0 1 ; 3 1 m 033 [ K a \t 033 [ m > 033 [ K 1 \n 033 [ 0 1 ; 3 1 m 033 [ K > a \t 033 [ m 033 [ K 2 \n > > And annotated: > > First line, first field: > 033 [ 0 1 ; 3 1 m 033 [ K a > \t > > First line, second field: > 033 [ m 033 [ K 1 > \n > > Second line, first field: > 033 [ 0 1 ; 3 1 m 033 [ K a > \t > > Second line, second field: > 033 [ m 033 [ K 2 > \n > > > Using "--debug" will give a hint as to what went wrong: > >$ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' \ > | sort -k2,2nr --debug >sort: using ‘en_CA.utf8’ sorting rules >a>1 > ^ no match for key > >a>2 > ^ no match for key > > > > The "no match for key" message means that the 2nd field failed to be > parsed as a numeric value. > > > > regards, > -assaf > -- Regards, Peng
Why TAB in ansi color is not recognized?
Hi, In the 2nd example, it is not sorted as what I want. Why is it so? $ printf '%s\t%s\n' a 1 a 2 |grep --color=always a | sort -k 2,2nr a 2 a 1 $ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | sort -k 2,2nr a 1 a 2 -- Regards, Peng
Is it possible to dd from a position in a file to the end?
Hi, I don't see a way to specify "END" in dd. I don't want to count the length a file in another command. Is there a way to let dd dump from a given location to the end? Thanks. -- Regards, Peng
tail -f finish upon another process finish writing to the file
Hi, I use tail -f to show a file as it grows. However, if the process which writes to the file is finished, tail -f will still wait there. Is there a way to let tail -f finish once it detects nobody writes to the file? Thanks. -- Regards, Peng
What tricks used in readlink to make it faster than realpath bash loadable?
Hi, `readlink` is faster than `realpath` for a large number of input arguments. Note that the former starts slower than the latter. What tricks is used in readlink to make it faster? Thanks. https://github.com/bminor/bash/blob/master/examples/loadables/realpath.c bash> builtin enable -f ~/Downloads/bash-4.4/examples/loadables/realpath realpath bash> type realpath realpath is a shell builtin bash> type readlink readlink is /usr/local/opt/coreutils/libexec/gnubin/readlink bash> readlink -e . > /dev/null real0m0.014s user0m0.003s sys0m0.006s bash> realpath . > /dev/null real0m0.003s user0m0.001s sys0m0.002s bash> readlink -e $(printf '. %.0s' {1..1}) > /dev/null real0m0.200s user0m0.078s sys0m0.121s bash> realpath $(printf '. %.0s' {1..1}) > /dev/null real0m0.211s user0m0.105s sys0m0.103s -- Regards, Peng
Understanding stdbuf
I thought that the -oL option will wait until a line is finished in the line buffer. So I'd expect the following output of stdbuf -oL -eL ./script.sh. abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz But the actual results are interleaved. Could anybody help me understand how stdbuf works? Thanks. $ cat script.sh #!/usr/bin/env bash # vim: set noexpandtab tabstop=2: for x in {a..z} do echo -n "$x" echo -n "$x" >&2 done echo echo >&2 $ stdbuf -oL -eL ./script.sh aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz $ -- Regards, Peng
Re: performance bug of `wc -m`
$ wc --version wc (GNU coreutils) 8.29 Copyright (C) 2017 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Paul Rubin and David MacKenzie. $ seq 100 | time wc -m 696 2.69 real 2.52 user 0.03 sys $ seq 100 | time ./wcm.py 696 1.30 real 1.18 user 0.04 sys On Sun, May 13, 2018 at 12:54 PM, Assaf Gordon wrote: > Hello, > > On Sun, May 13, 2018 at 09:05:47AM -0400, Peng Yu wrote: >> I am on Mac not on Linux. On Linux, I can confirm that `wc -m` is much >> faster than `wcm.py`. > > As a first step, please run "wc --version" to confirm you are using > gnu coreutils' wc and not the macos native wc program. > >> Here is the output on Mac. >> >> $ seq 100 > num.txt >> $ time wc -m < num.txt >> 696 >> >> real0m2.751s >> user0m2.622s >> sys0m0.042s >> $ time ./wcm.py < num.txt >> 696 >> >> real0m1.401s >> user0m1.234s >> sys0m0.051s > > Assuming it is coreutils' wc, I suspect file caching still plays > a significant role here. > > Can you try: > >seq 100 | time wc -m >seq 100 | time ./wcm.py > > And report the timing ? > > regards, > - assaf -- Regards, Peng
Re: performance bug of `wc -m`
I am on Mac not on Linux. On Linux, I can confirm that `wc -m` is much faster than `wcm.py`. Here is the output on Mac. $ seq 100 > num.txt $ time wc -m < num.txt 696 real0m2.751s user0m2.622s sys0m0.042s $ time ./wcm.py < num.txt 696 real0m1.401s user0m1.234s sys0m0.051s $ cat wcm.py #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import sys l = 0 for line in sys.stdin: l += len(line.decode('utf-8')) print l On Sun, May 13, 2018 at 2:18 AM, Assaf Gordon wrote: > Hello, > > On 12/05/18 07:55 PM, Peng Yu wrote: >> >> The following example shows that `wc -m` is even slower than the >> equivalent Python code. Can this performance bug be fixed? > > > I'm unable to reproduce the performance issue, > and suspect other issues are at play. > > First: >> >> import sys >> l = 0 >> for line in sys.stdin: >> l += len(line.rstrip('\n').decode('utf-8')) >> print l > > > This code is not identical to "wc -m" - it does not count the newlines > as characters. Example: > > $ seq 10 | wc -m > 21 > $ seq 10 | ./wcm.py > 11 > >> $ time ./wcm.py < 1.txt >> 6786930 >> $ time wc -m < 1.txt >> 6796930 > > > The fact that you are getting the exact same results indicates that your > input file (1.txt) does not have newlines at all: > > $ seq 10 | tr -d '\n' | ./wcm.py > 11 > $ seq 10 | tr -d '\n' | wc -m > 11 > > > Second: > I suspect the OS's file caching plays a big role in the skewed results. > It would be better to clear the cache and then time it: > > $ seq 100 | tr -d '\n' > 1.txt > $ ls -lhog 1.txt > -rw-r--r-- 1 5.7M May 13 00:05 1.txt > > $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches" > $ time wc -m < 1.txt > 596 > > real0m0.136s > user0m0.104s > sys 0m0.004s > > versus: > >$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches" >$ time ./wcm.py < 1.txt >596 > > real0m0.215s > user0m0.040s > sys 0m0.012s > > In my measurements python is twice as slow (for input with no newlines). > But the file is so small (5.7MB) that measurements can vary a lot. > > > Third: > If the file does have new lines (as is more common in typical text > files), then python becomes almost order of magnitude slower: > > $ seq 100 > 2.txt > $ ls -lhog 2.txt > -rw-r--r-- 1 6.6M May 13 00:08 2.txt > > $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches" > $ time wc -m < 2.txt > 696 > > real0m0.158s > user0m0.132s > sys 0m0.000s > > $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches" > $ time ./wcm.py < 2.txt > 596 > > real0m1.260s > user0m1.104s > sys 0m0.016s > > > > Fourth, > Unless you are certain your input files are valid, > using python2 + utf8 is very fragile, example: > > $ printf '\xEEabc\n' | ./wcm.py > Traceback (most recent call last): > File "./wcm.py", line 5, in > l += len(line.rstrip('\n').decode('utf-8')) > File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode byte 0xee in position 0: > invalid continuation byte > > While 'wc -m' will continue and not crash: > > $ printf '\xEEabc\n' | wc -m > 4 > > > > I hope this resolves the issue. > If you still think this is a bug, please provide more details > and a reproducible example. > > regards, > - assaf -- Regards, Peng
performance bug of `wc -m`
Hi, The following example shows that `wc -m` is even slower than the equivalent Python code. Can this performance bug be fixed? $ cat wcm.py #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import sys l = 0 for line in sys.stdin: l += len(line.rstrip('\n').decode('utf-8')) print l $ time ./wcm.py < 1.txt 6786930 real0m0.155s user0m0.059s sys0m0.048s $ time wc -m < 1.txt 6796930 real0m2.350s user0m2.280s sys0m0.017s -- Regards, Peng
What time is `sleep` based on?
For example, if I run `sleep 1000` and then I put the computer to sleep for 1000s and wake the computer up. Will the `sleep` finish at the time when the computer wakes up? Or `sleep` will take another 1000 seconds to terminate? Thanks. -- Regards, Peng
Re: Is there a way to print unicode characters and the actual code?
> $ od -An -tx1 -ta -tc <<< 'exámple' > 65 78 c3 a1 6d 70 6c 65 0a >e x C ! m p l e nl >e x 303 241 m p l e \n At this moment, I wrote some python code to do this, which prints both the decoded code as well as the encoded code in both hex and binary numbers in TSV format. `c if ord(c)>31 else repr(str(c)).strip("'")` is hacky. I am not sure if there is a good way get things like \f \b as `od` would. $ cat dumpunicode0.py #!/usr/bin/env python # vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8: import sys for line in sys.stdin: for c in line.decode('utf-8'): utf8_encode = '0x' + ''.join( ['%x' % ord(x) for x in reversed(c.encode('utf-8'))] ) print '\t'.join( ( c if ord(c)>31 else repr(str(c)).strip("'") , '0x%x' % ord(c) , bin(ord(c)).strip("'") , utf8_encode , bin(int(utf8_encode, base=16)).strip("'") ) ) $ ./dumpunicode0.py <<< á á0xe10b11110xa1c30b10111111 \n0xa0b10100xa0b1010 $ printf '\f'| od -xc 000000c \f 001 $ printf '\f'| ./dumpunicode0.py \x0c0xc0b11000xc0b1100 -- Regards, Peng
Is there a way to print unicode characters and the actual code?
I am not sure `od` respects unicode. Is there a tool (maybe different from od) that can print the code in odd lines and the unicode character in even lines? Thanks. $ od -xc <<< 'exámple' 0007865a1c3706d656c000a e x ? ? m p l e \n 011 In this particular case, 65 78 a1c3706d656c000a e x á m p l e \n -- Regards, Peng
Is there a way to print unicode characters and the actual code?
It seems that `od` does not respect the unicode. Is there a tool (maybe different from od) that can print the code in odd lines and the unicode character in even lines? Thanks. $ od -xc <<< 'exámple' 0007865a1c3706d656c000a e x ? ? m p l e \n 011 In this particular case, I'd like it to print something like the following (positions are omitted). Is there a tool for doing so? 65 78 c1 6d 70 6c 65 0a e x á m p l e \n -- Regards, Peng
Mapping of the special characters to the control sequences available?
Hi, The following URL says control-v followed by control-m will insert a CR. https://superuser.com/questions/942217/how-do-i-interactively-type-r-n-terminated-query-in-netcat?answertab=active#tab-top I understand control-v is to enter the next character typed literally. And control-m is a CR. https://en.wikipedia.org/wiki/Carriage_return Is there a complete table of the mapping of the special characters to the control sequences? -- Regards, Peng
Why is `find -name '*.txt'` much slower than '*.txt' on glusterfs?
Hi, There are ~7000 .txt files in a directory on glusterfs. Here are the run time of the following two commands. Does anybody know why the find command is much slower than *.txt. Is there a way to change the api that `find` uses to search files so that it can be more friendly to glusterfs? $ time echo *.txt > /dev/null real0m2.206s user0m0.039s sys 0m0.056s $ time find -name '*.txt' > /dev/null real0m18.558s user0m0.317s sys 0m0.663s -- Regards, Peng
Why cut treats one column input differently for out-of-range field spec?
Hi, If there is only one column in the input, then an out-of-range field spec will result in the print of the whole line. $ cut -f 3 <<< $'a' | xxd 000: 610a a. Otherwise, an empty string is printed. $ cut -f 3 <<< $'a\tb' | xxd 000: 0a . This is counter-intuitive. I think that one-field input should not be treated specially. It should still result in no output for an out-of-range field spec. Is there a strong reason why `cut` should treat one-field input different? (What if users do want empty string be printed for out-of-range field even for one-field input?) Or this should be considered as a bug? -- Regards, Peng
Speed up sort with concurrency
Hi, I see that concurrency can be used to speed up mergesort in golang. Can this be implemented in sort in coreutils? Thanks. https://medium.com/@_orcaman/when-too-much-concurrency-slows-you-down-golang-9c144ca305a -- Regards, Peng
Is there a way to always put NA before (or after) numerical values in sort?
Hi, I want to always put NA before (or after) numerical values being sorted. Is there a way to control this? Thanks. ~$ printf '%s\n' .1 1 NA | sort -k 1,1rg 1 .1 NA ~$ printf '%s\n' .1 1 NA | sort -k 1,1g NA .1 1 -- Regards, Peng
How to sort alphabetically?
Hi, "B" is listed before "a". Is there a way to sort alphabetically (as in an English dictionary)? (I think LC_* might need to be used, but I am not sure what value it should be.) Thanks. $ printf '%s\n' a B c | sort B a c -- Regards, Peng
Sort differently on mac with some LC_ALL
On mac, all the following LC_ALL result in the same results of sort. LC_ALL=en_US.UTF-8 sort <<< $'a\nb\nA\nB' A B a b LC_ALL=en_US sort <<< $'a\nb\nA\nB' A B a b LC_ALL=C sort <<< $'a\nb\nA\nB' A B a b But they are not all the same on linux. Do anybody know a LC_ALL on mac that would make sort sort differently? Thanks. LC_ALL=en_US.UTF-8 sort <<< $'a\nb\nA\nB' a A b B LC_ALL=en_US sort <<< $'a\nb\nA\nB' A B a b LC_ALL=C sort <<< $'a\nb\nA\nB' A B a b -- Regards, Peng
Does -e overrule -f in readlink?
Hi, It seems that -e overrules -f in readlink at least according to the following. If so, when -e is specified, specification of -f does not change the result of readlink. Is it the case? tmpdir=$(mktemp -d) cd "$tmpdir" ln -s z.txt d.txt readlink -f d.txt readlink -f -e d.txt || echo "$?" readlink -e d.txt || echo "$?"' -- Regards, Peng
Is there a way to specify the next business day in date?
Hi, I don't see a way to specify the next business day in date. Does anybody see if it is possible with date? -- Regards, Peng
ls when some directory only has one file/dir?
Hi, github can directly show the nested dir when a directory only has one subdir (e.g., inst/include on the following webpage). https://github.com/imbs-hl/ranger/tree/master/ranger-r-package/ranger I think that this is a good idea. Maybe this feature should be included in ls as well? -- Regards, Peng
How to ignore an empty file with paste
Hi, This example shows that an empty file will be used to create an empty column. But in some cases, it makes more sense to just ignore such a column. Is there a way to instruct paste to ignore an empty file? $ > empty_file $ paste empty_file <(seq 3) 1 2 3 -- Regards, Peng
What is the best way to touch a file and set its time of the last time of a bunch of other files?
Hi, `touch -r` allows one to set the time of a file same as a reference file. What if one wants to set the time to be the last time of multiple files? Is there an easy way to do so? -- Regards, Peng
Why cp a directory into itself still create an empty directory?
Hi, The following code shows that cp a directory into itself still create the tmp directory in the destination. Is better not to create it? /tmp$ mkdir tmp /tmp$ $(type -P cp) -r tmp tmp /usr/local/opt/coreutils/libexec/gnubin/cp: cannot copy a directory, ‘tmp’, into itself, ‘tmp/tmp’ /tmp$ ls -lgd /tmp/tmp drwxr-xr-x 3 wheel 102 Jun 14 11:05 /tmp/tmp -- Regards, Peng
Is `ls` exactly the same as `dir`?
Hi, It seems that `ls` and `dir` are exactly the same after I read the man pages. Is it the case? -- Regards, Peng
ls does not show broken like in red (coreutils installed from MacPorts)
Hi, `ls` does not show broken links in red. Does anybody know what is wrong? I show the things with ls below. /tmp$ echo $LS_COLORS /tmp$ dircolors LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:'; export LS_COLORS /tmp$ echo $TERM xterm-256color /tmp$ dircolors -p # Configuration file for dircolors, a utility to help you set the # LS_COLORS environment variable used by GNU ls with the --color option. # Copyright (C) 1996-2014 Free Software Foundation, Inc. # Copying and distribution of this file, with or without modification, # are permitted provided the copyright notice and this notice are preserved. # The keywords COLOR, OPTIONS, and EIGHTBIT (honored by the # slackware version of dircolors) are recognized but ignored. # Below, there should be one TERM entry for each termtype that is colorizable TERM Eterm TERM ansi TERM color-xterm TERM con132x25 TERM con132x30 TERM con132x43 TERM con132x60 TERM con80x25 TERM con80x28 TERM con80x30 TERM con80x43 TERM con80x50 TERM con80x60 TERM cons25 TERM console TERM cygwin TERM dtterm TERM eterm-color TERM gnome TERM gnome-256color TERM hurd TERM jfbterm TERM konsole TERM kterm TERM linux TERM linux-c TERM mach-color TERM mach-gnu-color TERM mlterm TERM putty TERM putty-256color TERM rxvt TERM rxvt-256color TERM rxvt-cygwin TERM rxvt-cygwin-native TERM rxvt-unicode TERM rxvt-unicode-256color TERM rxvt-unicode256 TERM screen TERM screen-256color TERM screen-256color-bce TERM screen-bce TERM screen-w TERM screen.Eterm TERM screen.rxvt TERM screen.linux TERM st TERM st-256color TERM terminator TERM vt100 TERM xterm TERM xterm-16color TERM xterm-256color TERM xterm-88color TERM xterm-color TERM xterm-debian # Below are the color init strings for the basic file types. A color init # string consists of one or more of the following numeric codes: # Attribute codes: # 00=none 01=bold 04=underscore 05=blink 07=reverse 08=concealed # Text color codes: # 30=black 31=red 32=green 33=yellow 34=blue 35=magenta 36=cyan 37=white # Background color codes: # 40=black 41=red 42=green 43=yellow 44=blue 45=magenta 46=cyan 47=white #NORMAL 00 # no color code at all #FILE 00 # regular file: use no color at all RESET 0 # reset to "normal" color DIR 01;34 # directory LINK 01;36 # symbolic link. (If you set this to 'target' instead of a # numerical value, the color is as for the file pointed to.) MULTIHARDLINK 00 # regular file with more than one link FIFO 40;33 # pipe SOCK 01;35 # socket DOOR 01;35 # door BLK 40;33;01 # block device driver CHR 40;33;01 # character device driver ORPHAN 40;31;01 # symlink to nonexistent file, or non-stat'able file SETUID 37;41 # file that is setuid (u+s) SETGID 30;43 # file that is setgid (g+s) CAPABILITY 30;41 # file with capability STICKY_OTHER_WRITABLE 30;42 # dir that is sticky and other-writable (+t,o+w) OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky STICKY 37;44 # dir with the sticky bit set (+t) and not other-writable # This is for files with execute permission: EXEC 01;32 # List any file extensions like '.gz' or '.tar' that you would like ls # to colorize below. Put the extension, a space, and the color init string. # (and any comments you want to add after a '#') # If you use DOS-style suffixes, you may want to uncomment the following: #.cmd 01;32 # executables (bright green) #.exe 01;32 #.com 01;32 #.btm 01;32 #.bat 01;32 # Or if you want to colorize scripts even if they do not have the # executable bit actually set. #.sh 01;32 #.csh 01;32 # archives or compressed (bright red) .tar 01;31 .tgz 01;31 .arc 01;31 .arj 01;31 .taz 01;31 .lha 01;31 .lz4 01;31 .lzh 01;3
Re: Does sort handle -t / correctly
On Fri, Apr 17, 2015 at 2:05 PM, Peng Yu wrote: > On Fri, Apr 17, 2015 at 12:31 PM, Eric Blake wrote: >> On 04/17/2015 11:03 AM, Peng Yu wrote: >>> On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake wrote: >>>> On 04/17/2015 10:10 AM, Peng Yu wrote: >>>>> Hi, I got the following results when I call sort with -t /. It seems >>>>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not >>>>> using sort correctly? >>>> >>>> Your assumption is correct - you are using sort incorrectly, by failing >>>> to take locales into account, and by failing to limit the amount of data >>>> being compared to single field widths. >>> >>> Thanks for the explanation. >>> >>> If I don't know the number of fields, but I want to sort according to >>> all fields (from 1 to whatever the max number of fields), is there a >>> way to do it? >> >> No one has really asked for that before. Are you going to propose some >> possible extension syntax to make it obvious how to generate as many key >> specifications as necessary to fully cover an arbitrary number of fields >> in a line? > > Since no -k options means treat each line just a whole string, maybe > one can allow -k without specifying any columns as treating each line > as all the set of fields in that line? BTW, one application of this syntax is to sort `find` is output. I.e., one want to put things under a directory to right after the directory name itself. My proposed syntax would work for this problem. But maybe there is an alternative solution to this problem? -- Regards, Peng
Re: Does sort handle -t / correctly
On Fri, Apr 17, 2015 at 12:31 PM, Eric Blake wrote: > On 04/17/2015 11:03 AM, Peng Yu wrote: >> On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake wrote: >>> On 04/17/2015 10:10 AM, Peng Yu wrote: >>>> Hi, I got the following results when I call sort with -t /. It seems >>>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not >>>> using sort correctly? >>> >>> Your assumption is correct - you are using sort incorrectly, by failing >>> to take locales into account, and by failing to limit the amount of data >>> being compared to single field widths. >> >> Thanks for the explanation. >> >> If I don't know the number of fields, but I want to sort according to >> all fields (from 1 to whatever the max number of fields), is there a >> way to do it? > > No one has really asked for that before. Are you going to propose some > possible extension syntax to make it obvious how to generate as many key > specifications as necessary to fully cover an arbitrary number of fields > in a line? Since no -k options means treat each line just a whole string, maybe one can allow -k without specifying any columns as treating each line as all the set of fields in that line? -- Regards, Peng
Re: Does sort handle -t / correctly
On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake wrote: > On 04/17/2015 10:10 AM, Peng Yu wrote: >> Hi, I got the following results when I call sort with -t /. It seems >> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not >> using sort correctly? > > Your assumption is correct - you are using sort incorrectly, by failing > to take locales into account, and by failing to limit the amount of data > being compared to single field widths. Thanks for the explanation. If I don't know the number of fields, but I want to sort according to all fields (from 1 to whatever the max number of fields), is there a way to do it? >> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4 >> a >> a! >> a/1.txt >> aB >> ab > > sort --debug is your friend: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4 > sort: using ‘en_US.UTF-8’ sorting rules > a > _ > ^ no match for key > ^ no match for key > ^ no match for key > _ > a! > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > a/1.txt > ___ > _ >^ no match for key >^ no match for key > ___ > ab > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > aB > __ > ^ no match for key > ^ no match for key > ^ no match for key > __ > > > As shown in the debug trace, the line 'a!' sorts prior to the line > 'a!1.txt' because your first sort key is the entire line, and in the > locale you are using (where both '!' and '/', and also '.', are ignored > in collation orders), the collation string "a" really does come before > "a1txt". > > What you REALLY want is to limit your sorting to a single field at a > time (-k1,1 rather than -k), as in: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2 > sort: using ‘en_US.UTF-8’ sorting rules > a > _ > ^ no match for key > _ > a/1.txt > _ > _ > ___ > a! > __ > ^ no match for key > __ > ab > __ > ^ no match for key > __ > aB > __ > ^ no match for key > __ > > > Or additionally, to limit your sorting to a locale that does not discard > punctuation as unimportant, as in: > > $ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1 > -k 2 > sort: using simple byte comparison > a > _ > ^ no match for key > _ > a/1.txt > _ > _ > ___ > a! > __ > ^ no match for key > __ > aB > __ > ^ no match for key > __ > ab > __ > ^ no match for key > __ > > > -- > Eric Blake eblake redhat com+1-919-301-3266 > Libvirt virtualization library http://libvirt.org > -- Regards, Peng
Does sort handle -t / correctly
Hi, I got the following results when I call sort with -t /. It seems that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not using sort correctly? $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4 a a! a/1.txt aB ab -- Regards, Peng
Is there a way to inherent the permissions related with o from the parent directory?
Hi, Is there a way to inherent the permissions related with o from the parent? For example, if the parent has the permission --- for o, when I mkdir a subdirectory, I want to subdirectory also has the permission --- for o. Is possible to somehow chmod of parent to allow this to happen? -- Regards, Peng
Re: Document for + seems to be missing in ls' document
> That's one of the reasons that I _like_ the 'html' version of the > manuals MUCH more than the 'info' version - you can choose to view the > entire manual at once, at which point, a simple 'ctrl-f' will let your > browser find the relevant text within the manual regardless of the > 'texinfo's division of information into sections. The real point is people want to see the manual at once. If so, why not make such a choice available in the command line. I feel cumbersome to have to use a browser while I am at the command line. Is there a way to view the entire textinfo page at once at the command line? -- Regards, Peng
-e missing for ls on Mac OS X
Hi, Mac OS X's ls has an option -e which related with ACLs. But coreutils' ls does not have this option, which make coreutils' ls not a complete replacement of Mac OS X's ls. Is it possible to add this feature to coreutils' ls? -- Regards, Peng
Re: Document for + seems to be missing in ls' document
On Wed, Mar 11, 2015 at 4:25 PM, Eric Blake wrote: > On 03/11/2015 03:13 PM, Peng Yu wrote: >> Hi, >> >> It seems that the document for ls in coreutils does not have an >> explanation of +. Should this be added? Thanks. >> >> http://serverfault.com/questions/227852/what-does-a-mean-at-the-end-of-the-permissions-from-ls-l > > It is already there: > > $ info coreutils 'What information is listed' > ... > Following the file mode bits is a single character that specifies > whether an alternate access method such as an access control list > applies to the file. When the character following the file mode > bits is a space, there is no alternate access method. When it is a > printing character, then there is such a method. > > GNU 'ls' uses a '.' character to indicate a file with an SELinux > security context, but no other alternate access method. > > A file with any other combination of alternate access methods is > marked with a '+' character. Shall the information about "+" be added to the manpage? -- Regards, Peng
Document for + seems to be missing in ls' document
Hi, It seems that the document for ls in coreutils does not have an explanation of +. Should this be added? Thanks. http://serverfault.com/questions/227852/what-does-a-mean-at-the-end-of-the-permissions-from-ls-l -- Regards, Peng
Where are the OPTS bdfgiMhnRrV of --key of sort documented?
Hi, I am trying to find the detailed meaning of bdfgiMhnRrV. But I can not find it in the manpage or the infopage. Does anybody know where are they documented? Thanks. -- Regards, Peng
Is there an easy way to generate all English letters?
Hi, seq can generate numbers easily. Is there an easy way to generate all English letters that anybody knows? -- Regards, Peng
Re: Why the memory usage of sort does not seem to increase as the input file size increases?
> Sort takes a divide and conquer approach, > by sorting parts of the input to temporary files, > and then merging the results with a bounded amount of memory. > > sort currently defaults to using a large memory buffer > to minimize overhead associated with writing and reading > temp files, so you may be seeing just this large memory > allocation each time. > > The memory allocation can be controlled with --buffer-size If I have enough memory, is it always faster to sort without using temp files. How to force sort always use memory only? Thanks. -- Regards, Peng
Why the memory usage of sort does not seem to increase as the input file size increases?
Hi, I tried "sort" on some large file. But the memory usage of "sort" does not seem to be large. This seems to be strange to me, as I think that sort need to see all the data before completing the sorting process. Shouldn't the memory usage of "sort" increase as the input size increases? Thanks. -- Regards, Peng
Is the command `sort input.txt -o input.txt` OK?
Hi, `sort input.txt -o input.txt` overwrites the input file. My understanding is that sort reads everything and then write the output. So it is OK to overwrite the original file. But I want to be sure. Can anyone confirm if this is the case? Thanks. -- Regards, Peng