from:"Peng Yu"

Re: How to get the current time in time zone represented by strings like +0100?

2024-05-16 Thread Peng Yu

> Yes. I think you will find this is all described in the manual at 
> https://www.gnu.org/software/coreutils/manual/html_node/Specifying-time-zone-rules.html

"""
‘TZ="<+0530>-5:30"’ says that the time zone abbreviation is ‘+0530’
and the time zone is 5 hours 30 minutes east of Greenwich.
"""

The negative means east in "-5:30". But why is time zone abbreviation
"<+0530>" positive? This is confusing. Why not make them consistent?


--
Regards,
Peng

Re: How to get the current time in time zone represented by strings like +0100?

2024-05-16 Thread Peng Yu

My understanding in the last email is wrong.

So +02:00 in your example is actually -0200 in my example, can date
take the meaning "+" as in my original example? Or I will have to flip
the signs myself?

$ TZ=Europe/Paris date
Thu May 16 15:27:48 CEST 2024
$ TZ='XXX+02:00' date
Thu May 16 11:28:06 XXX 2024

On Thu, May 16, 2024 at 8:24 AM Peng Yu  wrote:
>
> On Wed, May 15, 2024 at 12:04 AM Grisha Levit  wrote:
> >
> > On Tue, May 14 2024 at 16:05 Peng Yu wrote:
> > > For example, in the time zone represented by +0100, how to get its
> > > current time from date using '+0100' as input? Thanks.
> >
> > Use the offset to create a timezone specification, supplied in the TZ
> > environment variable.
> >
> > TZ='XXX-01:00' date
>
> Strings like +0100 is relative to UTC. For example, +0100 is Central
> European Time. I guess that you understood +0100 as relative to my
> current timezone.
>
> How to achieve +0100 as relative to UTC with date?
>
> > The `XXX` is an arbitrary (required) name. Note that the sign of the
> > offset has the opposite of its usual meaning. The full format can be
> > found in the tzset(3) man page.
>
> --
> Regards,
> Peng



-- 
Regards,
Peng

Re: How to get the current time in time zone represented by strings like +0100?

2024-05-16 Thread Peng Yu

On Wed, May 15, 2024 at 12:04 AM Grisha Levit  wrote:
>
> On Tue, May 14 2024 at 16:05 Peng Yu wrote:
> > For example, in the time zone represented by +0100, how to get its
> > current time from date using '+0100' as input? Thanks.
>
> Use the offset to create a timezone specification, supplied in the TZ
> environment variable.
>
> TZ='XXX-01:00' date

Strings like +0100 is relative to UTC. For example, +0100 is Central
European Time. I guess that you understood +0100 as relative to my
current timezone.

How to achieve +0100 as relative to UTC with date?

> The `XXX` is an arbitrary (required) name. Note that the sign of the
> offset has the opposite of its usual meaning. The full format can be
> found in the tzset(3) man page.

-- 
Regards,
Peng

How to get the current time in time zone represented by strings like +0100?

2024-05-14 Thread Peng Yu

Hi,

For example, in the time zone represented by +0100, how to get its
current time from date using '+0100' as input? Thanks.

-- 
Regards,
Peng

Re: mkdir -p competition on the same directory?

2023-02-09 Thread Peng Yu

OK.

I see the following output of `sudo dtruss mkdir -p d`. So
essentially, coreutils first calls system function mkdir to make the
directory. On error of the system call, it will check the target is a
directory. If the target is indeed a directory, then no error message
will be printed. Do I understand it correctly?

...
mkdir("d\0", 0x1FF, 0x0) = -1 Err#17
stat64("d\0", 0x7FFEE9953D20, 0x0) = 0 0
...

Therefore, when there is competition among many calls to coreutils
`mkdir -p`. The first instance will create the target, and the rest
instances will fail on the system call of mkdir. But since they find
the target is already created and is a directory, they will not
complain about the error system call mkdir. That is why I never see an
error similar to that of bash loadable `mkdir -p`. Is it so?

On 2/9/23, Pádraig Brady  wrote:
> On 09/02/2023 14:57, Peng Yu wrote:
>> https://lists.gnu.org/archive/html/help-bash/2023-02/msg00053.html
>>
>> Bash loadable `mkdir -p` has a problem when multiple loadable `mkdir
>> -p` is called on the same directory simultaneously.
>>
>> But I never see coreutils' `mkdir -p` has the same problem. Does
>> coreutils' `mkdir -p` do something extra to guard against the
>> competition on the same directory?
>
> `mkdir d; strace mkdir -p d` would be instructive,
> but yes coreutils mkdir essentially does:
>
>if mkdir(d) == EEXIST
>  return stat(d) == S_ISDIR
>
> cheers,
> Pádraig
>
>

-- 
Regards,
Peng

mkdir -p competition on the same directory?

2023-02-09 Thread Peng Yu

https://lists.gnu.org/archive/html/help-bash/2023-02/msg00053.html

Bash loadable `mkdir -p` has a problem when multiple loadable `mkdir
-p` is called on the same directory simultaneously.

But I never see coreutils' `mkdir -p` has the same problem. Does
coreutils' `mkdir -p` do something extra to guard against the
competition on the same directory?

-- 
Regards,
Peng

How to make mv -i return non zero status when uses choose n?

2023-01-07 Thread Peng Yu

Hi,

When I use mv -i and choose n so that the destination will not be
overwritten, the return status is still zero. Is there a way to let mv
return nonzero status to reflect that n is chosen by the user? Thanks.

-- 
Regards,
Peng

How to count the last line when it does not end with a newline character?

2021-09-04 Thread Peng Yu

I got 1 instead of 2 in the following example. How to count the last
even when it does not end with a newline character? Thanks.

$ printf 'a\nb'|wc -l
1

-- 
Regards,
Peng

Re: how to speed up sort for partially sorted input?

2021-08-11 Thread Peng Yu

On Wed, Aug 11, 2021 at 1:43 PM Kaz Kylheku (Coreutils)
<962-396-1...@kylheku.com> wrote:
>
> On 2021-08-11 05:03, Peng Yu wrote:
> > On Wed, Aug 11, 2021 at 5:29 AM Carl Edquist 
> > wrote:
> >> (With just a bit more work, you can do all your sorting in a single
> >> awk
> >> process too (without piping out to sort), but i think you'll still be
> >> disappointed with the performance compared to a single sort command.)
> >
> > Yes, this involves many calls of the coreuils' sort, which is not
>
> No, not this last remark, which is about "in a single awk process".

I know there is one awk process. I don't understand why you mentioned it.

> > efficient. Would it make sense to add an option in sort so that sort
> > can sort a partially sorted input in one shot.
>
> IF you're willing to use GNU Coreutils instead of Unix, you probably
> have

I don't think using awk is efficient. I am program a number awk
programs for simple transforming the input and tested it, in general,
it is slower than the equivalent python code, let along C code.

You can talk about doing most of the work in awk below. I don't think
that make sense. Having coreutils' sort be able to do a partial sort
is a more reasonable solution.

> GNU Awk also. GNU Awk has a sorting function using which a solution
> could be cobbed together. Maybe something like:
>
>
> function dump_delete_data()
> {
> n = asorti(data, idx);
> for (i = 1; i <= n; i++)
>   print data[idx[i]];
> delete data
> serial = 0
> }
>
> BEGIN   { serial = 0 }
> $1 != prev_1{ dump_delete_data() }
> NF >= 2 { prev_1 = $1
>data[$2 "." serial++] = $0
>next }
> 1   { dump_delete_data()
>print }
> END { dump_delete_data() }
>
>
> The asorti function has some features behind it to sort in various ways;
> you have to look into that. It involves manipulating a
> PROCINFO["sorted_in"]
> value.
>
> It's possible to use a custom comparison function.
>
> For more info, see GNU Awk documentation, the Gawk mailing list or
> the comp.lang.awk newsgroup.
>
> The purpose of the serial variable in my above code so that we get
> two entries in data[] if in a given group, there are identical $2
> values.
>
> For instance if $2 is "foo", then the key we use is actually "foo.3" if
> the current value of serial is 3. The sorting is then done on these
> suffixed keys, which works okay for lexicographic sorting.
>
> It is not a stable sort, though! Because foo.123 will be sorted before
> foo.23, even though the 123 serial value comes later. If we padded the
> integer with enough leading zeros for the largest possible group, it
> would then be stable: foo.00023 would come before foo.00123:
>
> data[sprintf("%s.%08X", $2, serial++)] = $0
>
> kind of thing.
>
> If you don't care about reproducing duplicates, you can remove this
> logic
> entirely.
>
> How the overall program works is that data[] is an array indexed on the
> second column values (plus serial suffixes). The value of each index
> value is the entire record, $0.
>
> asorti sorts the $2 indices, throwing away the $0 values, which
> is why we direct it into a secondary array called idx, preserving
> the data array. The idx array ends up indexed on integer values 1 to N,
> where N is the chunk size. If we iterate over these values, idx[i]
> gives us the $2 column values (with serial suffix) in sorted order.
> We can then use that as the key into data[] to get the corresponding
> records in sorted order.
>
> Cheers ...
>


-- 
Regards,
Peng

Re: how to speed up sort for partially sorted input?

2021-08-11 Thread Peng Yu

On Wed, Aug 11, 2021 at 5:29 AM Carl Edquist  wrote:
>
> On Tue, 10 Aug 2021, Kaz Kylheku (Coreutils) wrote:
>
> > On 2021-08-07 17:46, Peng Yu wrote:
> >>  Hi,
> >>
> >>  Suppose that I want to sort an input by column 1 and column 2 (column
> >>  1 is of a higher priority than column 2). The input is already sorted
> >>  by column1.
> >>
> >>  Is there a way to speed up the sort (compared with not knowing column
> >>  1 is already sorted)? Thanks.
> >
> > Since you know that colum 1 is sorted, it means that a sequential scan
> > of the data will reveal chunks that have the same colum1 value.
> >
> > You just have to read and separate these chunks, and sort each one
> > individually by column 2.
>
> Neat observation.
>
> You could do that tersely in awk by piping each chunk to a separate sort
> process, like:
>
> awk '
> c1 != $1 { close(sort); c1 = $1 }
> { print | sort }
> ' sort="sort -k2,2" partially-sorted-input.txt
>
> In theory, that would bring the sorting work down from ~ O(n * log(n)) to
> ~ O(n * log(n/m)) (for a partially-sorted file with n lines and m
> column-1 chunks of equal size).
>
> But the overhead of starting a new sort process for each chunk is likely
> going to outweigh that advantage.  In the end, just sorting the whole file
> at once (despite column 1 already being sorted) is still likely to be
> faster.
>
> (With just a bit more work, you can do all your sorting in a single awk
> process too (without piping out to sort), but i think you'll still be
> disappointed with the performance compared to a single sort command.)

Yes, this involves many calls of the coreuils' sort, which is not
efficient. Would it make sense to add an option in sort so that sort
can sort a partially sorted input in one shot.

-- 
Regards,
Peng

how to speed up sort for partially sorted input?

2021-08-07 Thread Peng Yu

Hi,

Suppose that I want to sort an input by column 1 and column 2 (column
1 is of a higher priority than column 2). The input is already sorted
by column1.

Is there a way to speed up the sort (compared with not knowing column
1 is already sorted)? Thanks.

-- 
Regards,
Peng

How to get the past Mon/Tue/.. not after the current date?

2021-07-20 Thread Peng Yu

Hi,

$ date -d 'last Tue' +%Y-%m-%d
2021-07-13
$ date +%Y-%m-%d
2021-07-20
$ date -d 'last Mon' +%Y-%m-%d
2021-07-19

I want to get the last day of week not in the future. In the above
example, I want to get this Tue (2021-07-20) instead the last Tue
(2021-07-13). But for Mon, I want to get 2021-07-19. Is there a date
string to get such output?

-- 
Regards,
Peng

od and UTF-8

2021-04-01 Thread Peng Yu

Hi,

I am wondering whether there is a way to do something similar to od,
but respect UTF-8 characters.

For example, instead of print this,

$ od -c -t x1 -Ax <<< α
00   �   �  \n
ce  b1  0a
03

I want to print this. Basically, if it is a printable UTF that does
not require escape, just print it as is.

$ od -c -t x1 -Ax <<< α
00   α
ce  b1  0a
03

But if it needs some transform to be seen clearly, it should be done.
For example, a space character should be printed as ' ' (with single
quote printed). Other space characters in UTF-8 should also be printed
with single quotes enclosing the space.

Is there a tool to do so?

-- 
Regards,
Peng

make ls -l dir1 dir2 in the same order as dir1,2 are specified

2021-03-26 Thread Peng Yu

Hi,

When I try `ls -l dir1 dir2`, the order of dir1 and dir2 in the output
is not necessarily the same as the input. How to make it the same as
the input order? Thanks.

-- 
Regards,
Peng

Re: differece between mkfifo and mknod ... p

2021-03-24 Thread Peng Yu

On Wed, Mar 24, 2021 at 4:52 PM Bob Proulx  wrote:
>
> Peng Yu wrote:
> > It seems that both `mkfifo` and `mknod ... p` can create a fifo. What
> > is the difference between them? Thanks.
>
> The mknod utility existed "for a decade" in Unix (don't quote me on
> that vague time statement) before mkfifo existed.  The mknod utility
> existed in Unix v7 as a thin wrapper around the mknod(2) system call.
>
> man 2 mknod
>
> https://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html
>
> Named pipes are special files and special files are created with
> mknod.  At least that was true until mkfifo came along.  mkfifo was
> standardized by POSIX while the mknod utility seems too OS specific
> and never made it into the standards as far as I know.
>
> Therefore "mkfifo" should be used for standards compliance and "mknod"
> should continue to exist for backwards compatibility.

In that case, should a warning message be printed to persuade people
not to use it? Otherwise, people will continue to use it.

By discouraging people from using it for a long period (say 10 years),
its support can be dropped eventually which will reduce future
maintenance costs of this duplicate code.

-- 
Regards,
Peng

Print modification time in compact form

2021-03-15 Thread Peng Yu

Hi,

I see modification time can be printed in this format.

$ stat -c '%y' file.txt
2017-07-31 17:50:54.0 +0100

Is there a way to directly print it as 20170731-1750? Thanks.

-- 
Regards,
Peng

Re: differece between mkfifo and mknod ... p

2021-03-13 Thread Peng Yu

Thanks. Why is there such a redundancy? Is it for backward compatibility?
If not for backward compatibility, I’d think mknod ... p should be removed,
for this syntax is worse than that of mkfifo.

On Sat, Mar 13, 2021 at 7:48 AM Steeve McCauley 
wrote:

> Ah, sorry, yeah more or less identical when it comes to making the fifo,
>
> $ strace mknod mknod p 2>&1 | grep -i fifo
> mknod("mknod", S_IFIFO|0666)= 0
> $ strace mkfifo mkfifo 2>&1 | grep -i fifo
> execve("/usr/bin/mkfifo", ["mkfifo", "mkfifo"], 0x7ffe46acfb88 /* 69 vars
> */) = 0
> mknod("mkfifo", S_IFIFO|0666)   = 0
>
> $ ls -l mk*
> prw-r--r-- 1 steeve steeve 0 Mar 13 08:45 mkfifo
> prw-r--r-- 1 steeve steeve 0 Mar 13 08:45 mknod
>
>
>
> On Sat, Mar 13, 2021 at 8:38 AM Peng Yu  wrote:
>
>> But my question is the p of mknod and mkfifo. Are they the same or
>> different?
>>
>> On Sat, Mar 13, 2021 at 5:21 AM Steeve McCauley <
>> steeve.mccau...@gmail.com> wrote:
>>
>>> mknod can make character (c) or block (b) or pipe (p) device files (as
>>> found under /dev).
>>>
>>> mkfifo makes "named pipes" so they can behave like files.
>>>
>>> https://en.wikipedia.org/wiki/Named_pipe
>>>
>>> On Sat, Mar 13, 2021 at 12:44 AM Peng Yu  wrote:
>>>
>>>> Hi,
>>>>
>>>> It seems that both `mkfifo` and `mknod ... p` can create a fifo. What
>>>> is the difference between them? Thanks.
>>>>
>>>> --
>>>> Regards,
>>>> Peng
>>>>
>>>>
>>>
>>> --
>>> :wq
>>>
>> --
>> Regards,
>> Peng
>>
>
>
> --
> :wq
>
-- 
Regards,
Peng

differece between mkfifo and mknod ... p

2021-03-12 Thread Peng Yu

Hi,

It seems that both `mkfifo` and `mknod ... p` can create a fifo. What
is the difference between them? Thanks.

-- 
Regards,
Peng

How to ensure UTF-8 sort?

2020-12-06 Thread Peng Yu

Hi,

I want to make sure sort is always use UTF-8. But I am not sure what
locale is universally available on all OSes. Does anybody know what is
the correct way to make sure sort by UTF-8 in all machines that
coreutils is installed? Thanks.

-- 
Regards,
Peng

Better support of timezone abbreviation in `date`?

2020-10-09 Thread Peng Yu

Hi,

It looks like some time zone abbreviations are not supported by
`date`. For example, THA is not supported. Can a more comprehensive
support be added? Thanks.

https://www.timeanddate.com/time/zone/thailand

-- 
Regards,
Peng

What timezone strings are supported by `date`?

2020-10-09 Thread Peng Yu

Hi,

It seems that time zone string like CET, PST are supported by `date`.
But I don't find a complete list of such strings supported by `date`.
Is there a doc that describe all of them? Thanks.

-- 
Regards,
Peng

What is the interpretation of bs of dd in terms of predicting the disk performance of other I/O bound programs?

2020-09-23 Thread Peng Yu

Hi,

Many people use dd to test disk performance. There is a key option dd,
which I understand what it literally means. But it is not clear how
there performance measured by dd using a specific bs maps to the disk
performance of other I/O bound programs. Could you anybody let me know
the interpretation of bs in terms of predicting the performance of
other I/O bound programs? Thanks.

-- 
Regards,
Peng

Re: How tail works on a large file?

2020-08-22 Thread Peng Yu

Do you mean that I need to run `hexedit the_large_file`. What is the
purpose of this? I don't quite understand.

On 8/22/20, Budi  wrote:
> use wxHexEditor or Curses Hexedit
> hit End to bring us to the tail
>
> On 8/22/20, Peng Yu  wrote:
>> Hi,
>>
>> I tried to tail a large file (2.8GB) to get is last 10 lines. It runs
>> very
>> fast.
>>
>> How is this achieved? Does tail do it differently between a file
>> (random disk access) and a pipe (sequential disk access)? Thanks.
>>
>> --
>> Regards,
>> Peng
>>
>>
>


-- 
Regards,
Peng

How tail works on a large file?

2020-08-22 Thread Peng Yu

Hi,

I tried to tail a large file (2.8GB) to get is last 10 lines. It runs very fast.

How is this achieved? Does tail do it differently between a file
(random disk access) and a pipe (sequential disk access)? Thanks.

-- 
Regards,
Peng

../.. resolution of ls

2020-06-07 Thread Peng Yu

Hi,

It seems that ../../ can not be resolved symbolically by ls. See the
following example. I'd like `ls ..` to print both a and b.
Unfortunately, it only print b because it thinks it is in /tmp/i/a/b
instead of /tmp/i/b. Is there a way to use symbolic pwd instead of abs
pwd? Thanks.

/tmp/i$ tree
.
├── a
│   └── b
└── b -> a/b/

/tmp/i$ cd b
/tmp/i/b$ ls -H ../
b
/tmp/i/b$ ls ../
b

-- 
Regards,
Peng

Re: Does -s apply to -m in sort?

2020-05-11 Thread Peng Yu

Are you the author of -m? If not, maybe the author of -m should knows how
it works with -s? If not, maybe this should be documented anyway?

On Mon, May 11, 2020 at 5:01 PM Eric Blake  wrote:

> On 5/11/20 4:18 PM, Peng Yu wrote:
> > I used real files (already sorted) to test whether having -s or not
> > affect -m. But I have not made minimal example input files so that is
> > why I am not sure about my conclusion.
> >
> > But the command to try is basically `sort -m -k sort_fields files...`
> > or `sort -s -m -k sort_fields files..`.
>
> That's closer - it shows a pseudo-command line you attempted.  But it
> still does not lend itself to reproducibility, because we don't know
> what 'sort_fields' you used, nor what 'files..' contain.
>
> You also didn't state whether you tried the --debug option, to see if
> the presence or absence of -s showed enough debugging crumbs to prove
> that you at least tried to analyze the problem yourself.  Nor did you
> mention whether you read the source code (it _is_ open source, after
> all, so instead of asking someone else to do your homework, _you_ can
> find the answer).
>
> >
> > I assume to authors who made -m and -s. My question should be clear?
>
> Unfortunately, your assumption is wrong.  A clear question is one that
> includes actual examples, and not one that forces someone to reproduce
> the work that you could have already provided them.  Put it this way:
> suppose it took you 5 minutes to come up with a test case, and that
> there are 100 list readers interested in your problem, each of whom then
> take another 5 minutes to reproduce the setup from your vague
> description.  Then you have cost 505 minutes of collective time; and
> your original work plus the work of each reader results in a very low
> signal-to-noise ratio (5/505 is less than 1% new discoveries, and more
> than 99% rehash).  But if it takes you an additional 5 minutes to polish
> your query into an email that can then be copy-pasted into a terminal so
> that each reader can reproduce the problem in 5 seconds, then your
> initial 10 minutes of effort (which is indeed twice the work on your
> part) plus 500 seconds of list readers' time results in a much better
> ratio of useful new work (5/18.3 is > 27%).  Although it costs you more,
> your efforts to make everyone else's life easier is magnified by the
> number of readers benefitted by your extra efforts.  (And that's why I'm
> spending so long in writing my reply - to try to teach you that your
> historical style of questioning leaves a lot to be desired, as well as a
> possibly-futile attempt on my part to get you to recognize that the more
> effort YOU put into a good bug report, the less likely you are to be
> habitually ignored as someone who merely wastes time).
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>
> --
Regards,
Peng

Re: Does -s apply to -m in sort?

2020-05-11 Thread Peng Yu

I used real files (already sorted) to test whether having -s or not
affect -m. But I have not made minimal example input files so that is
why I am not sure about my conclusion.

But the command to try is basically `sort -m -k sort_fields files...`
or `sort -s -m -k sort_fields files..`.

I assume to authors who made -m and -s. My question should be clear?

On 5/11/20, Eric Blake  wrote:
> On 5/9/20 4:31 PM, Peng Yu wrote:
>> It seems that -s of sort is not useful when -m is used based on my
>> simple test case. But I am not completely sure. Could anybody let me
>> know if this is the case? Thanks.
>
> Without seeing your simple test case, I cannot presume to know what you
> tried or failed to try in making your determination, and lack the
> information necessary to repeat the experiment myself.  Help us help you
> - when asking a question, give us enough relevant details, including the
> contents of the files and the command line you attempted your experiment
> with.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>

-- 
Regards,
Peng

Does -s apply to -m in sort?

2020-05-09 Thread Peng Yu

It seems that -s of sort is not useful when -m is used based on my
simple test case. But I am not completely sure. Could anybody let me
know if this is the case? Thanks.

-- 
Regards,
Peng

How to ls a directory with the directory path prepended?

2020-04-27 Thread Peng Yu

When I `ls` a directory, the content will be shown without the
directory path. Is there an option of `ls` to prepend the directory
path?

Note that I am not looking for this way, as it involves shell.

ls d/*

Thanks.

-- 
Regards,
Peng

Which sha sum is the fastest?

2020-04-27 Thread Peng Yu

I got the following run time on a file of 116M.

They are ranked in this order. Is this runtime order in general true?

sha1sum < sha384sum <~ sha512sum < sha256sum <~ sha224sum

==> sha1sum <==

real0m0.330s
user0m0.275s
sys 0m0.042s
==> sha224sum <==

real0m0.679s
user0m0.640s
sys 0m0.029s
==> sha256sum <==

real0m0.668s
user0m0.633s
sys 0m0.027s
==> sha384sum <==

real0m0.380s
user0m0.350s
sys 0m0.027s
==> sha512sum <==

real0m0.388s
user0m0.354s
sys 0m0.027s


-- 
Regards,
Peng

altchars for base64

2020-03-14 Thread Peng Yu

Hi,

Python base64 decoder has the altchars option.

https://docs.python.org/3/library/base64.html
base64.b64decode(s, altchars=None, validate=False)¶

But I don't see such an option in coreutils' base64. Can this option
be added? Thanks.

-- 
Regards,
Peng

sort by hex number?

2020-03-05 Thread Peng Yu

I have a TSV file with a column in hex format, e.g., 0x1a000, 0x17000, 0xe000.

Is there a way to sort the rows by this column in hex? Thanks.

-- 
Regards,
Peng

Re: What is the difference between unlink and rm -f?

2020-01-29 Thread Peng Yu

So a one-line summary is

When the target can be delete, unlink and rm -f are the same;
otherwise, unlink will complain about the error and exit with 1, but
rm -f will do neither.

On 1/29/20, Kaz Kylheku (Coreutils) <962-396-1...@kylheku.com> wrote:
> On 2020-01-29 01:45, Peng Yu wrote:
>> Hi,
>>
>> It seems to me unlink and rm -f are the same if the goal is the delete
>> files. When are they different? Thanks.
>
> I answered this on Unix Stackexchange in 2016:
>
> https://unix.stackexchange.com/a/326711/16369
>
> :)
>


-- 
Regards,
Peng

Re: Show directory time as the latest time of the file in the directory (including subdirs)

2020-01-29 Thread Peng Yu

No. -t just shows the time of the directory itself. I want a summary
time which is the latest time of all the contents (including the ones
in the subdirecties, subsubdirs,...) in the directory.

On 1/29/20, Bernhard Voelker  wrote:
> On 1/29/20 10:58 AM, Peng Yu wrote:
>> Hi,
>>
>> For directories, ls shows in the time of the directory itself.
>> Sometimes, it is more important to show the latest time of files in
>> the directory in addition to the directory time.
>>
>> Is there an easy way to show such information? Thanks.
>
> I'm afraid I don't understand fully what you want to achieve.
> Please give a small example.
> Do you mean the -t option of 'ls'?
>
> Have a nice day,
> Berny
>


-- 
Regards,
Peng

Show directory time as the latest time of the file in the directory (including subdirs)

2020-01-29 Thread Peng Yu

Hi,

For directories, ls shows in the time of the directory itself.
Sometimes, it is more important to show the latest time of files in
the directory in addition to the directory time.

Is there an easy way to show such information? Thanks.

-- 
Regards,
Peng

What is the difference between unlink and rm -f?

2020-01-29 Thread Peng Yu

Hi,

It seems to me unlink and rm -f are the same if the goal is the delete
files. When are they different? Thanks.

-- 
Regards,
Peng

how to use touch to change change time?

2020-01-19 Thread Peng Yu

Hi,

I don't see how to change change time by touch. Is it possible with
touch? Thanks.

   --time=WORD
  change the specified time: WORD is access, atime, or use: equiv-
  alent to -a WORD is modify or mtime: equivalent to -m


-- 
Regards,
Peng

Re: How to implement the V comparsion used by sort in python?

2019-10-26 Thread Peng Yu

Are you sure they are 100% compatible with V? I don’t want to use them just
later find they are not 100% compatible.

On Sat, Oct 26, 2019 at 4:24 PM Assaf Gordon  wrote:

> Hello,
>
>
> On Oct 25, 2019, at 8:00 PM, Peng Yu  wrote:
>
>
> I'd like to mimic the V sort order in python. Is there any easy to use
> comparison available in python?
>
>
> A simple online search will show several python packages that can do it.
> For example:
>
> https://deb-pkg-tools.readthedocs.io/en/latest/api.html#module-deb_pkg_tools.version
>
> -assaf
>
-- 
Regards,
Peng

How to implement the V comparsion used by sort in python?

2019-10-25 Thread Peng Yu

Hi,

I'd like to mimic the V sort order in python. Is there any easy to use
comparison available in python? The following implementation is simple
but it is not exactly the same as the sort order of V used in sort.
Thanks.

https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

-- 
Regards,
Peng

Can natural sort support be added?

2019-10-08 Thread Peng Yu

Hi,

Since natural sort is provided in a few languages (as mentioned in the
Wikipedia page). Can it be supported by `sort` besides just
version-sort?

https://en.wikipedia.org/wiki/Natural_sort_order

-- 
Regards,
Peng

Re: Is natural sort supported?

2019-10-08 Thread Peng Yu

> At the risk of arguing over semantics,
> I'll say again: there is no "one correct" natural order standard,
> and therefore it is not "plain and simple" because there is no just
> "one" such order.

I don't think there is no commonly accepted "natural sort". For
example, I found another one that uses the same order as the python
one that I showed above. The so-called version sort in corutils' sort
is just not natural sort and it should not be called natural sort.

$ printf '%s\n' G . | csvtk sort -k 1:N
G
.
$ printf '%s\n' 1G 1. | csvtk sort -k 1:N
1G
1.
$ printf '%s\n' 1G13 1.02 | csvtk sort -k 1:N
1G13
1.02

Wikipedia also explains what natural sort is and provided a few
implementation links. I don't think any of them implemented the
version sort as the natural sort.

https://en.wikipedia.org/wiki/Natural_sort_order

> and note that even the above blog writes:
> "... Don't let Ned's clever Python ten-liner fool you. Implementing a
> natural sort is more complex than it seems ... ".

I don't understand this sentence. There is an implementation with just
a few lines in python. Unless this implementation is wrong, then there
is a simple implementation at least in python.

-- 
Regards,
Peng

Re: Is natural sort supported?

2019-10-08 Thread Peng Yu

Some part of the manual is also poorly written.

"1.1.2 Origin of version sort and differences from natural sort"

After reading the above section, I am still not clear what is the
difference. It is better to show some examples to illustrate the
difference.

On 10/8/19, Peng Yu  wrote:
> Then, the option name causes misunderstand. -V is actually
> --debian-version. And it is not natural order (there is no such thing
> like extension handling with natural order). The natural order is
> plain and simple, just as what is explained below, which can be
> implemented by a few lines of python code.
>
> https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/
>
> So my question is whether natural order as in the above URL is supported?
>
> On 10/8/19, Assaf Gordon  wrote:
>> Hello,
>>
>> On 2019-10-08 12:36 a.m., Peng Yu wrote:
>>> The following example shows that version sort is not natural sort. Is
>>> natural sort supported in by `sort`?
>>
>> There is no such thing as "THE correct natural sort" order...
>>
>>> $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order
>>> should have been reversed.
>>
>> ... therefore "should have" is simply incorrect expectation.
>>
>> You might think it "should" be one way, and other implementations
>> think it "should" be another way.
>>
>> For more details, please see the attached HTML file for details.
>>
>> (this HTML file is a new chapter of the coreutils manual that will be
>> included in the next release. The source texinfo is here:
>> https://git.savannah.gnu.org/cgit/coreutils.git/tree/doc/sort-version.texi
>> ).
>>
>> regards,
>>   - assaf
>>
>>
>
>
> --
> Regards,
> Peng
>


-- 
Regards,
Peng

Re: Is natural sort supported?

2019-10-08 Thread Peng Yu

Then, the option name causes misunderstand. -V is actually
--debian-version. And it is not natural order (there is no such thing
like extension handling with natural order). The natural order is
plain and simple, just as what is explained below, which can be
implemented by a few lines of python code.

https://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

So my question is whether natural order as in the above URL is supported?

On 10/8/19, Assaf Gordon  wrote:
> Hello,
>
> On 2019-10-08 12:36 a.m., Peng Yu wrote:
>> The following example shows that version sort is not natural sort. Is
>> natural sort supported in by `sort`?
>
> There is no such thing as "THE correct natural sort" order...
>
>> $ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order
>> should have been reversed.
>
> ... therefore "should have" is simply incorrect expectation.
>
> You might think it "should" be one way, and other implementations
> think it "should" be another way.
>
> For more details, please see the attached HTML file for details.
>
> (this HTML file is a new chapter of the coreutils manual that will be
> included in the next release. The source texinfo is here:
> https://git.savannah.gnu.org/cgit/coreutils.git/tree/doc/sort-version.texi
> ).
>
> regards,
>   - assaf
>
>


-- 
Regards,
Peng

Is natural sort supported?

2019-10-07 Thread Peng Yu

Hi,

The following example shows that version sort is not natural sort. Is
natural sort supported in by `sort`?

$ printf '%s\n' G . | LC_ALL=C sort -k 1,1V
.
G
$ printf '%s\n' 1G 1. | LC_ALL=C sort -k 1,1V
1.
1G
$ printf '%s\n' 1G13 1.02 | LC_ALL=C sort -k 1,1V # The result order
should have been reversed.
1G13
1.02

-- 
Regards,
Peng

Re: How to sort unicode properly?

2019-09-25 Thread Peng Yu

If python can have pyuca that works across platform, why such thing can not
have at C level?

On Wed, Sep 25, 2019 at 12:24 PM Eric Blake  wrote:

> On 9/25/19 10:56 AM, Peng Yu wrote:
> > I want to make my `sort` to be machine-independent and always use the
> > correct Unicode sort order. Is there a way to do so?
>
> Those two goals are somewhat at odds.  The only truly portable
> machine-independent sorting is the one guaranteed by POSIX when you use
> LC_ALL=C (fun fact: even on an EBCDIC machine, that is required by POSIX
> to collate in ASCII order, rather than native byte order).  The moment
> you use any other locale, then you not only left to the mercies of
> whoever wrote that locale, but also stuck with the fact that there is no
> portable way to transfer locale definitions from one vendor's libc to
> another.
>
> >
> > I don't know how to check where en_US.UTF-8 comes from. Do you know
> > how to check it? (I use Mac OS X.)
>
> All other locales are somewhat vendor-dependent; as you've discovered,
> your vendor (Apple) has a rather gaping hole in their locale support.
> But because Apple is a closed-source shop, it will have to be Apple that
> fixes their bug, unless you want to take on the gargantuan task of
> writing a gnulib module that provides locale tables to mirror glibc for
> use on non-glibc machines.
>
> Note that glibc doesn't have that problem, at least on my system:
>
> $ cat /etc/fedora-release
> Fedora release 30 (Thirty)
> $ rpm -q glibc
> glibc-2.29-22.fc30.x86_64
> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort --debug
> sort: text ordering performed using ‘en_US.UTF-8’ sorting rules
> cafe
> 
> café
> 
> caff
> 
>
> So one option you could pursue is switching to an operating system that
> does not curtail your freedoms.
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
-- 
Regards,
Peng

Re: How to sort unicode properly?

2019-09-25 Thread Peng Yu

I want to make my `sort` to be machine-independent and always use the
correct Unicode sort order. Is there a way to do so?

I don't know how to check where en_US.UTF-8 comes from. Do you know
how to check it? (I use Mac OS X.)

On 9/25/19, Eric Blake  wrote:
> On 9/25/19 10:20 AM, Peng Yu wrote:
>> Hi,
>>
>> It seems that "café" should be sorted before "caff" in Unicode.
>>
>> https://github.com/jtauber/pyuca
>>
>> But `sort` does not do so.
>>
>> $ printf '%s\n' cafe caff café | LC_ALL=UTF8  sort
>> cafe
>> caff
>> café
>> $ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort
>> cafe
>> caff
>> café
>>
>> How to make `sort` sort according to Unicode order? Thanks.
>
> You'll have to write a locale definition where strcoll() sorts in the
> order you want.  Coreutils sort is calling strcoll(), and if it doesn't
> sort the way you think it should, the bug is in your locale and not in
> coreutils.  You'll want to report this issue to whoever provided your
> en_US.UTF-8 locale (perhaps glibc?)
>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>


-- 
Regards,
Peng

How to sort unicode properly?

2019-09-25 Thread Peng Yu

Hi,

It seems that "café" should be sorted before "caff" in Unicode.

https://github.com/jtauber/pyuca

But `sort` does not do so.

$ printf '%s\n' cafe caff café | LC_ALL=UTF8  sort
cafe
caff
café
$ printf '%s\n' cafe caff café | LC_ALL=en_US.UTF-8  sort
cafe
caff
café

How to make `sort` sort according to Unicode order? Thanks.

-- 
Regards,
Peng

Can -f of seq take an integer format?

2019-08-01 Thread Peng Yu

Hi,

I only find %.0f to print integers. But it is just a float with no
digits after the point. Is there a real integer format in seq? Thanks.

$ seq -f '%.0f minutes' 2563199 2563200
2563199 minutes
2563200 minutes
$ seq -f '%g minutes' 2563199 2563200
2.5632e+06 minutes
2.5632e+06 minutes
2.5632e+06 minutes
$ seq -f '%d minutes' 2563199 2563200
seq: format ‘%d minutes’ has unknown %d directive

-- 
Regards,
Peng

How to convert a md5sum back to a timestamp?

2019-07-31 Thread Peng Yu

Hi,

Suppose that I know a md5sum that is derived one of the timestamps
computed below. Is there a way to quickly derive what the original
timestamp is? I could make a database of all the timestamps and their
md5sums. But as the total number of entries increases, this solution
will not be scalable as the database can be big. Is it there any
better solution to this problem?

for i in {1..2563200}; do date -d "-$i minutes" +%Y%m%d_%I%M%p; done

-- 
Regards,
Peng

How to list not only content in a diretory but the directory itself as well?

2019-07-09 Thread Peng Yu

Hi

`ls somedir` without -d will show the content of a directory. With -d,
it will show the info of the directory itself. Is there a way to show
both in a single command? Thanks.

-- 
Regards,
Peng

Re: Is there a way to gzip the temp file used by `sort`?

2019-07-01 Thread Peng Yu

Thanks. Does this option affect the -m option? Thanks.

On 7/1/19, Ed  wrote:
> On 2019-07-01 10:44-0500, Peng Yu wrote:
>> Hi,
>>
>> The temp files used by `sort` are not gzipped. Is there a way to use
>> gzip to save the space used by the temp files? Thanks.
>
> Did you try --compress-program=gzip?
>
> --
> Best regards,
> Ed http://www.s5h.net/
>
>


-- 
Regards,
Peng

Is there a way to gzip the temp file used by `sort`?

2019-07-01 Thread Peng Yu

Hi,

The temp files used by `sort` are not gzipped. Is there a way to use
gzip to save the space used by the temp files? Thanks.

-- 
Regards,
Peng

How to print sizes of both files and directories in a directory?

2019-06-30 Thread Peng Yu

Hi,

`du -h --max-depth=1` only print directory sizes. Is there a way to
print the sizes of both directories and files in a directory? Thanks.

-- 
Regards,
Peng

Re: How to sort and count efficiently?

2019-06-30 Thread Peng Yu

The problem with this kind of awk program is that everything will be loaded
to memory. But bare `sort` use external files to save memory. When the hash
in awk is too large, accessing it can become very slow (maybe due to
potential cache miss or slow down of hash as a function of hash size).

On Sun, Jun 30, 2019 at 11:52 AM Assaf Gordon  wrote:

> Correcting myself:
>
> On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote:
> > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote:
> > >
> > > I have a long list of string (each string is in a line). I need to
> > > count the number of appearance for each string.
> > >
> > > [...] Does anybody know any better way
> > > to make the sort and count run more efficiently?
> > >
> >
> > Or using gnu awk:
>
> use 'asorti' instead of 'asort', with the two-parameter variant:
>
>
>   $ printf "%s\n" a c b b b b b b c \
> | awk 'a[$1]++ {}
>END { n = asorti(a,b)
>  for (i = 1; i <= n; i++) {
> print b[i], a[b[i]]
>  }
>}'
>   a 1
>   b 6
>   c 2
>
>
> For more details see:
>
> https://www.gnu.org/software/gawk/manual/html_node/Array-Sorting-Functions.html#Array-Sorting-Functions
>
> -assaf
>
> --
Regards,
Peng

How to sort and count efficiently?

2019-06-30 Thread Peng Yu

Hi,

I have a long list of string (each string is in a line). I need to
count the number of appearance for each string.

I currently use `sort` to sort the list and then use another program
to do the count. The second program doing the count needs only a small
amount of the memory as the input is sorted.

But `sort` writes a lot of temp files like `sortjISjDY`, which are
very large. Because I only need the count, ideally, I'd like these
temp files only keep the count info and the original string once, but
not the original string many times. Does anybody know any better way
to make the sort and count run more efficiently?

-- 
Regards,
Peng

Does --parallel apply to merge sort?

2019-06-11 Thread Peng Yu

Hi,

It seems that there is no need to use parallelization for merge sort.
So for the following option of `sort`, I think that it only applies to
regular sort by not merge sort. Is it so?

   --parallel=N
  change the number of sorts run concurrently to N

-- 
Regards,
Peng

Re: How to calculate date relative to another date?

2019-05-21 Thread Peng Yu

> Seems to work fine when date specification is not quite as ambiguous
> as "2018/05".
>
> $ date --iso --date='2018-05-01 5 years ago'
> 2013-05-01

What is special about --iso? If I use the following date string, I get
a future time. Why?

$ date --date='2018-05-01 4 years 11 months ago' +%Y%m
202106

-- 
Regards,
Peng

How to calculate date relative to another date?

2019-05-21 Thread Peng Yu

Hi,

For example, I want to calculate 5 years less a month from May 2018,
i.e., "2018/05", the result should be "2013/06".

https://www.gnu.org/software/coreutils/manual/html_node/Examples-of-date.html

I don't think the direct calculation of this kind of relative date is
possible with coreutiles' date command. Some kind of external
arithmetic calculation must be used. Is it so?

-- 
Regards,
Peng

Re: Why TAB in ansi color is not recognized?

2019-04-28 Thread Peng Yu

Thanks. Where the `[ K` come from? I only see `[ m` but not `[ K`.
What does `[ K` mean? Thanks.

http://pueblo.sourceforge.net/doc/manual/ansi_color_codes.html

On Sun, Apr 28, 2019 at 2:49 PM Assaf Gordon  wrote:
>
> Hello,
>
> On 2019-04-28 11:23 a.m., Peng Yu wrote:
> >
> > In the 2nd example, it is not sorted as what I want. Why is it so?
> >
> > $ printf '%s\t%s\n' a 1 a 2 |grep --color=always a | sort -k 2,2nr
> > a 2
> > a 1
> > $ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | sort -k 2,2nr
> > a 1
> > a 2
> >
>
> The 'grep' in the second example highlights *both* the 'a' character
> and the 'tab' character.
>
> This means that the ANSI sequence to restore color (\033 [ m \033  [ K)
> appears *after* the tab, and is then parsed by 'sort' as the beginning
> of the second field:
>
> $ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | od -tc -An
>   033   [   0   1   ;   3   1   m 033   [   K   a  \t 033   [   m
>   033   [   K   1  \n 033   [   0   1   ;   3   1   m 033   [   K
> a  \t 033   [   m 033   [   K   2  \n
>
> And annotated:
>
> First line, first field:
> 033   [   0   1   ;   3   1   m 033   [   K   a
> \t
>
> First line, second field:
> 033   [   m 033   [   K   1
> \n
>
> Second line, first field:
> 033   [   0   1   ;   3   1   m 033   [   K   a
> \t
>
> Second line, second field:
> 033   [   m 033   [   K   2
> \n
>
>
> Using "--debug" will give a hint as to what went wrong:
>
>$ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' \
> | sort -k2,2nr --debug
>sort: using ‘en_CA.utf8’ sorting rules
>a>1
> ^ no match for key
>
>a>2
>   ^ no match for key
>
>
>
> The "no match for key" message means that the 2nd field failed to be
> parsed as a numeric value.
>
>
>
> regards,
>   -assaf
>


-- 
Regards,
Peng

Why TAB in ansi color is not recognized?

2019-04-28 Thread Peng Yu

Hi,

In the 2nd example, it is not sorted as what I want. Why is it so?

$ printf '%s\t%s\n' a 1 a 2 |grep --color=always a | sort -k 2,2nr
a   2
a   1
$ printf '%s\t%s\n' a 1 a 2 | grep --color=always a$'\t' | sort -k 2,2nr
a   1
a   2


-- 
Regards,
Peng

Is it possible to dd from a position in a file to the end?

2019-02-19 Thread Peng Yu

Hi, I don't see a way to specify "END" in dd. I don't want to count
the length a file in another command. Is there a way to let dd dump
from a given location to the end? Thanks.

-- 
Regards,
Peng

tail -f finish upon another process finish writing to the file

2019-01-21 Thread Peng Yu

Hi,

I use tail -f to show a file as it grows. However, if the process
which writes to the file is finished, tail -f will still wait there.
Is there a way to let tail -f finish once it detects nobody writes to
the file? Thanks.

-- 
Regards,
Peng

What tricks used in readlink to make it faster than realpath bash loadable?

2018-12-13 Thread Peng Yu

Hi,

`readlink` is faster than `realpath` for a large number of input
arguments. Note that the former starts slower than the latter. What
tricks is used in readlink to make it faster? Thanks.

https://github.com/bminor/bash/blob/master/examples/loadables/realpath.c

bash> builtin enable -f
~/Downloads/bash-4.4/examples/loadables/realpath realpath
bash> type realpath
realpath is a shell builtin
bash> type readlink
readlink is /usr/local/opt/coreutils/libexec/gnubin/readlink
bash> readlink -e . > /dev/null

real0m0.014s
user0m0.003s
sys0m0.006s
bash> realpath . > /dev/null

real0m0.003s
user0m0.001s
sys0m0.002s
bash> readlink -e $(printf '. %.0s' {1..1}) > /dev/null

real0m0.200s
user0m0.078s
sys0m0.121s
bash> realpath $(printf '. %.0s' {1..1}) > /dev/null

real0m0.211s
user0m0.105s
sys0m0.103s

-- 
Regards,
Peng

Understanding stdbuf

2018-11-14 Thread Peng Yu

I thought that the -oL option will wait until a line is finished in
the line buffer. So I'd expect the following output of stdbuf -oL -eL
./script.sh.

abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz

But the actual results are interleaved. Could anybody help me
understand how stdbuf works? Thanks.

$ cat script.sh
#!/usr/bin/env bash
# vim: set noexpandtab tabstop=2:

for x in {a..z}
do
echo -n "$x"
echo -n "$x" >&2
done
echo
echo >&2
$ stdbuf -oL -eL ./script.sh
aabbccddeeffgghhiijjkkllmmnnooppqqrrssttuuvvwwxxyyzz

$

-- 
Regards,
Peng

Re: performance bug of `wc -m`

2018-05-13 Thread Peng Yu

$ wc --version
wc (GNU coreutils) 8.29
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Paul Rubin and David MacKenzie.
$ seq 100 | time wc -m
696
2.69 real 2.52 user 0.03 sys
$ seq 100 | time ./wcm.py
696
1.30 real 1.18 user 0.04 sys

On Sun, May 13, 2018 at 12:54 PM, Assaf Gordon  wrote:
> Hello,
>
> On Sun, May 13, 2018 at 09:05:47AM -0400, Peng Yu wrote:
>> I am on Mac not on Linux. On Linux, I can confirm that `wc -m` is much
>> faster than `wcm.py`.
>
> As a first step, please run "wc --version" to confirm you are using
> gnu coreutils' wc and not the macos native wc program.
>
>> Here is the output on Mac.
>>
>> $ seq 100 > num.txt
>> $ time wc -m < num.txt
>> 696
>>
>> real0m2.751s
>> user0m2.622s
>> sys0m0.042s
>> $  time ./wcm.py < num.txt
>> 696
>>
>> real0m1.401s
>> user0m1.234s
>> sys0m0.051s
>
> Assuming it is coreutils' wc, I suspect file caching still plays
> a significant role here.
>
> Can you try:
>
>seq 100 | time wc -m
>seq 100 | time ./wcm.py
>
> And report the timing ?
>
> regards,
>  - assaf



-- 
Regards,
Peng

Re: performance bug of `wc -m`

2018-05-13 Thread Peng Yu

I am on Mac not on Linux. On Linux, I can confirm that `wc -m` is much
faster than `wcm.py`.

Here is the output on Mac.

$ seq 100 > num.txt
$ time wc -m < num.txt
696

real0m2.751s
user0m2.622s
sys0m0.042s
$  time ./wcm.py < num.txt
696

real0m1.401s
user0m1.234s
sys0m0.051s
$ cat wcm.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import sys
l = 0
for line in sys.stdin:
l += len(line.decode('utf-8'))
print l


On Sun, May 13, 2018 at 2:18 AM, Assaf Gordon  wrote:
> Hello,
>
> On 12/05/18 07:55 PM, Peng Yu wrote:
>>
>> The following example shows that `wc -m` is even slower than the
>> equivalent Python code. Can this performance bug be fixed?
>
>
> I'm unable to reproduce the performance issue,
> and suspect other issues are at play.
>
> First:
>>
>> import sys
>> l = 0
>> for line in sys.stdin:
>>  l += len(line.rstrip('\n').decode('utf-8'))
>> print l
>
>
> This code is not identical to "wc -m" - it does not count the newlines
> as characters. Example:
>
>   $ seq 10 | wc -m
>   21
>   $ seq 10 | ./wcm.py
>   11
>
>> $ time ./wcm.py < 1.txt
>> 6786930
>> $ time wc -m < 1.txt
>> 6796930
>
>
> The fact that you are getting the exact same results indicates that your
> input file (1.txt) does not have newlines at all:
>
>   $ seq 10 | tr -d '\n' | ./wcm.py
>   11
>   $ seq 10 | tr -d '\n' | wc -m
>   11
>
>
> Second:
> I suspect the OS's file caching plays a big role in the skewed results.
> It would be better to clear the cache and then time it:
>
>   $ seq 100 | tr -d '\n' > 1.txt
>   $ ls -lhog 1.txt
>   -rw-r--r-- 1 5.7M May 13 00:05 1.txt
>
>   $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
>   $ time wc -m < 1.txt
>   596
>
>   real0m0.136s
>   user0m0.104s
>   sys 0m0.004s
>
> versus:
>
>$ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
>$ time ./wcm.py < 1.txt
>596
>
>   real0m0.215s
>   user0m0.040s
>   sys 0m0.012s
>
> In my measurements python is twice as slow (for input with no newlines).
> But the file is so small (5.7MB) that measurements can vary a lot.
>
>
> Third:
> If the file does have new lines (as is more common in typical text
> files), then python becomes almost order of magnitude slower:
>
>   $ seq 100 > 2.txt
>   $ ls -lhog 2.txt
>   -rw-r--r-- 1 6.6M May 13 00:08 2.txt
>
>   $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
>   $ time wc -m < 2.txt
>   696
>
>   real0m0.158s
>   user0m0.132s
>   sys 0m0.000s
>
>   $ sudo sh -c "sync; echo 3 > /proc/sys/vm/drop_caches"
>   $ time ./wcm.py < 2.txt
>   596
>
>   real0m1.260s
>   user0m1.104s
>   sys 0m0.016s
>
>
>
> Fourth,
> Unless you are certain your input files are valid,
> using python2 + utf8 is very fragile, example:
>
>   $ printf '\xEEabc\n' | ./wcm.py
>   Traceback (most recent call last):
> File "./wcm.py", line 5, in 
>   l += len(line.rstrip('\n').decode('utf-8'))
> File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
>   return codecs.utf_8_decode(input, errors, True)
>   UnicodeDecodeError: 'utf8' codec can't decode byte 0xee in position 0:
>   invalid continuation byte
>
> While 'wc -m' will continue and not crash:
>
>   $ printf '\xEEabc\n' | wc -m
>   4
>
>
>
> I hope this resolves the issue.
> If you still think this is a bug, please provide more details
> and a reproducible example.
>
> regards,
>  - assaf



-- 
Regards,
Peng

performance bug of `wc -m`

2018-05-12 Thread Peng Yu

Hi,

The following example shows that `wc -m` is even slower than the
equivalent Python code. Can this performance bug be fixed?

$ cat wcm.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import sys
l = 0
for line in sys.stdin:
l += len(line.rstrip('\n').decode('utf-8'))
print l

$ time ./wcm.py < 1.txt
6786930

real0m0.155s
user0m0.059s
sys0m0.048s
$ time wc -m < 1.txt
6796930

real0m2.350s
user0m2.280s
sys0m0.017s

-- 
Regards,
Peng

What time is `sleep` based on?

2018-03-20 Thread Peng Yu

For example, if I run `sleep 1000` and then I put the computer to
sleep for 1000s and wake the computer up. Will the `sleep` finish at
the time when the computer wakes up? Or `sleep` will take another 1000
seconds to terminate? Thanks.

-- 
Regards,
Peng

Re: Is there a way to print unicode characters and the actual code?

2018-02-24 Thread Peng Yu

> $ od -An -tx1 -ta -tc <<< 'exámple'
>   65  78  c3  a1  6d  70  6c  65  0a
>e   x   C   !   m   p   l   e  nl
>e   x 303 241   m   p   l   e  \n

At this moment, I wrote some python code to do this, which prints both
the decoded code as well as the encoded code in both hex and binary
numbers in TSV format.

`c if ord(c)>31 else repr(str(c)).strip("'")` is hacky. I am not sure
if there is a good way get things like \f \b as `od` would.

$ cat dumpunicode0.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import sys

for line in sys.stdin:
for c in line.decode('utf-8'):
utf8_encode = '0x' + ''.join(
['%x' % ord(x) for x in reversed(c.encode('utf-8'))]
)
print '\t'.join(
(
c if ord(c)>31 else repr(str(c)).strip("'")
, '0x%x' % ord(c)
, bin(ord(c)).strip("'")
, utf8_encode
, bin(int(utf8_encode, base=16)).strip("'")
)
)
$ ./dumpunicode0.py <<< á
á0xe10b11110xa1c30b10111111
\n0xa0b10100xa0b1010
$ printf '\f'| od -xc
000000c
 \f
001
$ printf '\f'| ./dumpunicode0.py
\x0c0xc0b11000xc0b1100

-- 
Regards,
Peng

Is there a way to print unicode characters and the actual code?

2018-02-24 Thread Peng Yu

I am not sure `od` respects unicode.

Is there a tool (maybe different from od) that can print the code in
odd lines and the unicode character in even lines? Thanks.

$ od -xc <<< 'exámple'
0007865a1c3706d656c000a
  e   x   ?   ?   m   p   l   e  \n
011

In this particular case,

65 78   a1c3706d656c000a
e x á m p l e \n

-- 
Regards,
Peng

Is there a way to print unicode characters and the actual code?

2018-02-24 Thread Peng Yu

It seems that `od` does not respect the unicode.

Is there a tool (maybe different from od) that can print the code in
odd lines and the unicode character in even lines? Thanks.

$ od -xc <<< 'exámple'
0007865a1c3706d656c000a
  e   x   ?   ?   m   p   l   e  \n
011

In this particular case, I'd like it to print something like the
following (positions are omitted). Is there a tool for doing so?

65 78 c1 6d 70 6c 65 0a
e x á m p l e \n

-- 
Regards,
Peng

Mapping of the special characters to the control sequences available?

2018-02-09 Thread Peng Yu

Hi,

The following URL says control-v followed by control-m will insert a CR.

https://superuser.com/questions/942217/how-do-i-interactively-type-r-n-terminated-query-in-netcat?answertab=active#tab-top

I understand control-v is to enter the next character typed literally.
And control-m is a CR.

https://en.wikipedia.org/wiki/Carriage_return

Is there a complete table of the mapping of the special characters to
the control sequences?

-- 
Regards,
Peng

Why is `find -name '.txt'` much slower than '.txt' on glusterfs?

2018-01-19 Thread Peng Yu

Hi,

There are ~7000 .txt files in a directory on glusterfs. Here are the
run time of the following two commands. Does anybody know why the find
command is much slower than *.txt. Is there a way to change the api
that `find` uses to search files so that it can be more friendly to
glusterfs?

$ time echo *.txt > /dev/null

real0m2.206s
user0m0.039s
sys 0m0.056s
$ time find -name '*.txt' > /dev/null

real0m18.558s
user0m0.317s
sys 0m0.663s

-- 
Regards,
Peng

Why cut treats one column input differently for out-of-range field spec?

2018-01-17 Thread Peng Yu

Hi,

If there is only one column in the input, then an out-of-range field
spec will result in the print of the whole line.

$ cut -f 3 <<< $'a' | xxd
000: 610a a.

Otherwise, an empty string is printed.

$ cut -f 3 <<< $'a\tb' | xxd
000: 0a   .

This is counter-intuitive. I think that one-field input should not be
treated specially. It should still result in no output for an
out-of-range field spec.

Is there a strong reason why `cut` should treat one-field input
different? (What if users do want empty string be printed for
out-of-range field even for one-field input?) Or this should be
considered as a bug?

-- 
Regards,
Peng

Speed up sort with concurrency

2018-01-14 Thread Peng Yu

Hi, I see that concurrency can be used to speed up mergesort in golang. Can
this be implemented in sort in coreutils? Thanks.

https://medium.com/@_orcaman/when-too-much-concurrency-slows-you-down-golang-9c144ca305a
-- 
Regards,
Peng

Is there a way to always put NA before (or after) numerical values in sort?

2017-12-08 Thread Peng Yu

Hi,

I want to always put NA before (or after) numerical values being
sorted. Is there a way to control this? Thanks.

~$ printf '%s\n' .1 1 NA | sort -k 1,1rg
1
.1
NA
~$ printf '%s\n' .1 1 NA | sort -k 1,1g
NA
.1
1

-- 
Regards,
Peng

How to sort alphabetically?

2017-08-13 Thread Peng Yu

Hi, "B" is listed before "a". Is there a way to sort alphabetically
(as in an English dictionary)? (I think LC_* might need to be used,
but I am not sure what value it should be.) Thanks.

$ printf '%s\n' a B c | sort
B
a
c

-- 
Regards,
Peng

Sort differently on mac with some LC_ALL

2016-12-10 Thread Peng Yu

On mac, all the following LC_ALL result in the same results of sort.


LC_ALL=en_US.UTF-8 sort <<< $'a\nb\nA\nB'
A
B
a
b
LC_ALL=en_US sort <<< $'a\nb\nA\nB'
A
B
a
b
LC_ALL=C sort <<< $'a\nb\nA\nB'
A
B
a
b

But they are not all the same on linux. Do anybody know a LC_ALL on
mac that would make sort sort differently? Thanks.

LC_ALL=en_US.UTF-8 sort <<< $'a\nb\nA\nB'
a
A
b
B
LC_ALL=en_US sort <<< $'a\nb\nA\nB'
A
B
a
b
LC_ALL=C sort <<< $'a\nb\nA\nB'
A
B
a
b

-- 
Regards,
Peng

Does -e overrule -f in readlink?

2016-09-24 Thread Peng Yu

Hi, It seems that -e overrules -f in readlink at least according to
the following. If so, when -e is specified, specification of -f does
not change the result of readlink. Is it the case?

tmpdir=$(mktemp -d)
cd "$tmpdir"
ln -s z.txt d.txt
readlink -f d.txt
readlink -f -e d.txt || echo "$?"
readlink -e d.txt || echo "$?"'

-- 
Regards,
Peng

Is there a way to specify the next business day in date?

2016-08-12 Thread Peng Yu

Hi, I don't see a way to specify the next business day in date. Does
anybody see if it is possible with date?

-- 
Regards,
Peng

ls when some directory only has one file/dir?

2016-05-31 Thread Peng Yu

Hi, github can directly show the nested dir when a directory only has
one subdir (e.g., inst/include on the following webpage).

https://github.com/imbs-hl/ranger/tree/master/ranger-r-package/ranger

I think that this is a good idea. Maybe this feature should be
included in ls as well?

-- 
Regards,
Peng

How to ignore an empty file with paste

2016-03-04 Thread Peng Yu

Hi, This example shows that an empty file will be used to create an
empty column. But in some cases, it makes more sense to just ignore
such a column. Is there a way to instruct paste to ignore an empty
file?

$ > empty_file
$ paste empty_file <(seq 3)
1
2
3

-- 
Regards,
Peng

What is the best way to touch a file and set its time of the last time of a bunch of other files?

2015-08-07 Thread Peng Yu

Hi, `touch -r` allows one to set the time of a file same as a
reference file. What if one wants to set the time to be the last time
of multiple files? Is there an easy way to do so?

-- 
Regards,
Peng

Why cp a directory into itself still create an empty directory?

2015-06-14 Thread Peng Yu

Hi,

The following code shows that cp a directory into itself still create
the tmp directory in the destination. Is better not to create it?

/tmp$ mkdir tmp
/tmp$ $(type -P cp)  -r tmp tmp
/usr/local/opt/coreutils/libexec/gnubin/cp: cannot copy a directory,
‘tmp’, into itself, ‘tmp/tmp’
/tmp$ ls -lgd /tmp/tmp
drwxr-xr-x 3 wheel 102 Jun 14 11:05 /tmp/tmp

-- 
Regards,
Peng

Is `ls` exactly the same as `dir`?

2015-05-12 Thread Peng Yu

Hi, It seems that `ls` and `dir` are exactly the same after I read the
man pages. Is it the case?

-- 
Regards,
Peng

ls does not show broken like in red (coreutils installed from MacPorts)

2015-04-24 Thread Peng Yu

Hi, `ls` does not show broken links in red. Does anybody know what is wrong?

I show the things with ls below.

/tmp$ echo $LS_COLORS

/tmp$ dircolors
LS_COLORS='rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.axv=01;35:*.anx=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.axa=00;36:*.oga=00;36:*.spx=00;36:*.xspf=00;36:';
export LS_COLORS
/tmp$ echo $TERM
xterm-256color
/tmp$ dircolors -p
# Configuration file for dircolors, a utility to help you set the
# LS_COLORS environment variable used by GNU ls with the --color option.
# Copyright (C) 1996-2014 Free Software Foundation, Inc.
# Copying and distribution of this file, with or without modification,
# are permitted provided the copyright notice and this notice are preserved.
# The keywords COLOR, OPTIONS, and EIGHTBIT (honored by the
# slackware version of dircolors) are recognized but ignored.
# Below, there should be one TERM entry for each termtype that is colorizable
TERM Eterm
TERM ansi
TERM color-xterm
TERM con132x25
TERM con132x30
TERM con132x43
TERM con132x60
TERM con80x25
TERM con80x28
TERM con80x30
TERM con80x43
TERM con80x50
TERM con80x60
TERM cons25
TERM console
TERM cygwin
TERM dtterm
TERM eterm-color
TERM gnome
TERM gnome-256color
TERM hurd
TERM jfbterm
TERM konsole
TERM kterm
TERM linux
TERM linux-c
TERM mach-color
TERM mach-gnu-color
TERM mlterm
TERM putty
TERM putty-256color
TERM rxvt
TERM rxvt-256color
TERM rxvt-cygwin
TERM rxvt-cygwin-native
TERM rxvt-unicode
TERM rxvt-unicode-256color
TERM rxvt-unicode256
TERM screen
TERM screen-256color
TERM screen-256color-bce
TERM screen-bce
TERM screen-w
TERM screen.Eterm
TERM screen.rxvt
TERM screen.linux
TERM st
TERM st-256color
TERM terminator
TERM vt100
TERM xterm
TERM xterm-16color
TERM xterm-256color
TERM xterm-88color
TERM xterm-color
TERM xterm-debian
# Below are the color init strings for the basic file types. A color init
# string consists of one or more of the following numeric codes:
# Attribute codes:
# 00=none 01=bold 04=underscore 05=blink 07=reverse 08=concealed
# Text color codes:
# 30=black 31=red 32=green 33=yellow 34=blue 35=magenta 36=cyan 37=white
# Background color codes:
# 40=black 41=red 42=green 43=yellow 44=blue 45=magenta 46=cyan 47=white
#NORMAL 00 # no color code at all
#FILE 00 # regular file: use no color at all
RESET 0 # reset to "normal" color
DIR 01;34 # directory
LINK 01;36 # symbolic link. (If you set this to 'target' instead of a
 # numerical value, the color is as for the file pointed to.)
MULTIHARDLINK 00 # regular file with more than one link
FIFO 40;33 # pipe
SOCK 01;35 # socket
DOOR 01;35 # door
BLK 40;33;01 # block device driver
CHR 40;33;01 # character device driver
ORPHAN 40;31;01 # symlink to nonexistent file, or non-stat'able file
SETUID 37;41 # file that is setuid (u+s)
SETGID 30;43 # file that is setgid (g+s)
CAPABILITY 30;41 # file with capability
STICKY_OTHER_WRITABLE 30;42 # dir that is sticky and other-writable (+t,o+w)
OTHER_WRITABLE 34;42 # dir that is other-writable (o+w) and not sticky
STICKY 37;44 # dir with the sticky bit set (+t) and not other-writable
# This is for files with execute permission:
EXEC 01;32
# List any file extensions like '.gz' or '.tar' that you would like ls
# to colorize below. Put the extension, a space, and the color init string.
# (and any comments you want to add after a '#')
# If you use DOS-style suffixes, you may want to uncomment the following:
#.cmd 01;32 # executables (bright green)
#.exe 01;32
#.com 01;32
#.btm 01;32
#.bat 01;32
# Or if you want to colorize scripts even if they do not have the
# executable bit actually set.
#.sh 01;32
#.csh 01;32
 # archives or compressed (bright red)
.tar 01;31
.tgz 01;31
.arc 01;31
.arj 01;31
.taz 01;31
.lha 01;31
.lz4 01;31
.lzh 01;3

Re: Does sort handle -t / correctly

2015-04-17 Thread Peng Yu

On Fri, Apr 17, 2015 at 2:05 PM, Peng Yu  wrote:
> On Fri, Apr 17, 2015 at 12:31 PM, Eric Blake  wrote:
>> On 04/17/2015 11:03 AM, Peng Yu wrote:
>>> On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake  wrote:
>>>> On 04/17/2015 10:10 AM, Peng Yu wrote:
>>>>> Hi, I got the following results when I call sort with -t /. It seems
>>>>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
>>>>> using sort correctly?
>>>>
>>>> Your assumption is correct - you are using sort incorrectly, by failing
>>>> to take locales into account, and by failing to limit the amount of data
>>>> being compared to single field widths.
>>>
>>> Thanks for the explanation.
>>>
>>> If I don't know the number of fields, but I want to sort according to
>>> all fields (from 1 to whatever the max number of fields), is there a
>>> way to do it?
>>
>> No one has really asked for that before.  Are you going to propose some
>> possible extension syntax to make it obvious how to generate as many key
>> specifications as necessary to fully cover an arbitrary number of fields
>> in a line?
>
> Since no -k options means treat each line just a whole string, maybe
> one can allow -k without specifying any columns as treating each line
> as all the set of fields in that line?

BTW, one application of this syntax is to sort `find` is output. I.e.,
one want to put things under a directory to right after the directory
name itself.

My proposed syntax would work for this problem. But maybe there is an
alternative solution to this problem?

-- 
Regards,
Peng

Re: Does sort handle -t / correctly

2015-04-17 Thread Peng Yu

On Fri, Apr 17, 2015 at 12:31 PM, Eric Blake  wrote:
> On 04/17/2015 11:03 AM, Peng Yu wrote:
>> On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake  wrote:
>>> On 04/17/2015 10:10 AM, Peng Yu wrote:
>>>> Hi, I got the following results when I call sort with -t /. It seems
>>>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
>>>> using sort correctly?
>>>
>>> Your assumption is correct - you are using sort incorrectly, by failing
>>> to take locales into account, and by failing to limit the amount of data
>>> being compared to single field widths.
>>
>> Thanks for the explanation.
>>
>> If I don't know the number of fields, but I want to sort according to
>> all fields (from 1 to whatever the max number of fields), is there a
>> way to do it?
>
> No one has really asked for that before.  Are you going to propose some
> possible extension syntax to make it obvious how to generate as many key
> specifications as necessary to fully cover an arbitrary number of fields
> in a line?

Since no -k options means treat each line just a whole string, maybe
one can allow -k without specifying any columns as treating each line
as all the set of fields in that line?

-- 
Regards,
Peng

Re: Does sort handle -t / correctly

2015-04-17 Thread Peng Yu

On Fri, Apr 17, 2015 at 11:26 AM, Eric Blake  wrote:
> On 04/17/2015 10:10 AM, Peng Yu wrote:
>> Hi, I got the following results when I call sort with -t /. It seems
>> that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
>> using sort correctly?
>
> Your assumption is correct - you are using sort incorrectly, by failing
> to take locales into account, and by failing to limit the amount of data
> being compared to single field widths.

Thanks for the explanation.

If I don't know the number of fields, but I want to sort according to
all fields (from 1 to whatever the max number of fields), is there a
way to do it?

>> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
>> a
>> a!
>> a/1.txt
>> aB
>> ab
>
> sort --debug is your friend:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1 -k 2 -k 3 -k 4
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
>  ^ no match for key
>  ^ no match for key
>  ^ no match for key
> _
> a!
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
> a/1.txt
> ___
>   _
>^ no match for key
>^ no match for key
> ___
> ab
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
>   ^ no match for key
>   ^ no match for key
> __
>
>
> As shown in the debug trace, the line 'a!' sorts prior to the line
> 'a!1.txt' because your first sort key is the entire line, and in the
> locale you are using (where both '!' and '/', and also '.', are ignored
> in collation orders), the collation string "a" really does come before
> "a1txt".
>
> What you REALLY want is to limit your sorting to a single field at a
> time (-k1,1 rather than -k), as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | sort --debug -t / -k 1,1 -k 2,2
> sort: using ‘en_US.UTF-8’ sorting rules
> a
> _
>  ^ no match for key
> _
> a/1.txt
> _
>   _
> ___
> a!
> __
>   ^ no match for key
> __
> ab
> __
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
> __
>
>
> Or additionally, to limit your sorting to a locale that does not discard
> punctuation as unimportant, as in:
>
> $ printf '%s\n' a 'a!' ab aB a/1.txt | LC_ALL=C sort --debug -t / -k 1,1
> -k 2
> sort: using simple byte comparison
> a
> _
>  ^ no match for key
> _
> a/1.txt
> _
>   _
> ___
> a!
> __
>   ^ no match for key
> __
> aB
> __
>   ^ no match for key
> __
> ab
> __
>   ^ no match for key
> __
>
>
> --
> Eric Blake   eblake redhat com+1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>



-- 
Regards,
Peng

Does sort handle -t / correctly

2015-04-17 Thread Peng Yu

Hi, I got the following results when I call sort with -t /. It seems
that 'a/1.txt' should be right after 'a'. Is it the case? Or I am not
using sort correctly?

$ printf '%s\n' a 'a!' ab aB a/1.txt | sort -t / -k 1 -k 2 -k 3 -k 4
a
a!
a/1.txt
aB
ab

-- 
Regards,
Peng

Is there a way to inherent the permissions related with o from the parent directory?

2015-03-17 Thread Peng Yu

Hi,

Is there a way to inherent the permissions related with o from the parent?

For example, if the parent has the permission --- for o, when I mkdir
a subdirectory, I want to subdirectory also has the permission --- for
o. Is possible to somehow chmod of parent to allow this to happen?

-- 
Regards,
Peng

Re: Document for + seems to be missing in ls' document

2015-03-13 Thread Peng Yu

> That's one of the reasons that I _like_ the 'html' version of the
> manuals MUCH more than the 'info' version - you can choose to view the
> entire manual at once, at which point, a simple 'ctrl-f' will let your
> browser find the relevant text within the manual regardless of the
> 'texinfo's division of information into sections.

The real point is people want to see the manual at once. If so, why
not make such a choice available in the command line. I feel
cumbersome to have to use a browser while I am at the command line.

Is there a way to view the entire textinfo page at once at the command line?

-- 
Regards,
Peng

-e missing for ls on Mac OS X

2015-03-12 Thread Peng Yu

Hi,

Mac OS X's ls has an option -e which related with ACLs. But coreutils'
ls does not have this option, which make coreutils' ls not a complete
replacement of Mac OS X's ls. Is it possible to add this feature to
coreutils' ls?

-- 
Regards,
Peng

Re: Document for + seems to be missing in ls' document

2015-03-11 Thread Peng Yu

On Wed, Mar 11, 2015 at 4:25 PM, Eric Blake  wrote:
> On 03/11/2015 03:13 PM, Peng Yu wrote:
>> Hi,
>>
>> It seems that the document for ls in coreutils does not have an
>> explanation of +. Should this be added? Thanks.
>>
>> http://serverfault.com/questions/227852/what-does-a-mean-at-the-end-of-the-permissions-from-ls-l
>
> It is already there:
>
> $ info coreutils 'What information is listed'
> ...
>  Following the file mode bits is a single character that specifies
>  whether an alternate access method such as an access control list
>  applies to the file.  When the character following the file mode
>  bits is a space, there is no alternate access method.  When it is a
>  printing character, then there is such a method.
>
>  GNU 'ls' uses a '.' character to indicate a file with an SELinux
>  security context, but no other alternate access method.
>
>  A file with any other combination of alternate access methods is
>  marked with a '+' character.

Shall the information about "+" be added to the manpage?

-- 
Regards,
Peng

Document for + seems to be missing in ls' document

2015-03-11 Thread Peng Yu

Hi,

It seems that the document for ls in coreutils does not have an
explanation of +. Should this be added? Thanks.

http://serverfault.com/questions/227852/what-does-a-mean-at-the-end-of-the-permissions-from-ls-l

-- 
Regards,
Peng

Where are the OPTS bdfgiMhnRrV of --key of sort documented?

2014-12-25 Thread Peng Yu

Hi,

I am trying to find the detailed meaning of bdfgiMhnRrV. But I can not
find it in the manpage or the infopage. Does anybody know where are
they documented? Thanks.

-- 
Regards,
Peng

Is there an easy way to generate all English letters?

2014-10-04 Thread Peng Yu

Hi,

seq can generate numbers easily. Is there an easy way to generate all
English letters that anybody knows?

-- 
Regards,
Peng

Re: Why the memory usage of sort does not seem to increase as the input file size increases?

2014-05-26 Thread Peng Yu

> Sort takes a divide and conquer approach,
> by sorting parts of the input to temporary files,
> and then merging the results with a bounded amount of memory.
>
> sort currently defaults to using a large memory buffer
> to minimize overhead associated with writing and reading
> temp files, so you may be seeing just this large memory
> allocation each time.
>
> The memory allocation can be controlled with --buffer-size

If I have enough memory, is it always faster to sort without using
temp files. How to force sort always use memory only? Thanks.

-- 
Regards,
Peng

Why the memory usage of sort does not seem to increase as the input file size increases?

2014-05-26 Thread Peng Yu

Hi,

I tried "sort" on some large file. But the memory usage of "sort" does
not seem to be large. This seems to be strange to me, as I think that
sort need to see all the data before completing the sorting process.
Shouldn't the memory usage of "sort" increase as the input size
increases? Thanks.

-- 
Regards,
Peng

Is the command `sort input.txt -o input.txt` OK?

2014-03-15 Thread Peng Yu

Hi,

`sort input.txt -o input.txt` overwrites the input file. My
understanding is that sort reads everything and then write the output.
So it is OK to overwrite the original file. But I want to be sure. Can
anyone confirm if this is the case? Thanks.

-- 
Regards,
Peng

1 2 >

1 - 100 of 166 matches

Mail list logo