Re: tail

2022-05-11 Thread Avi Gross via Python-list
This seems to be a regular refrain where someone wants something as STANDARD in 
a programming language or environment and others want to keep it lean and mean 
or do not see THIS suggestion as particularly important or useful.
Looking at the end of something is extremely common. Packages like numpy/pandas 
in Python often provide functions with names like head or tail as do other 
languages where data structures with names like data.frame are commonly used. 
These structures are in some way indexed to make it easy to jump towards the 
end. Text files are not.

Efficiency aside, a 3-year-old (well, certainly a 30 year old)  can cobble 
together a function that takes a filename assumed to be textual and reads the 
file into some data structure that stores the lines of the file and so it can 
be indexed by line number and also report the index of the final line. The data 
structure can be a list of lists or a dictionary with line numbers as keys or a 
numpy ...

So the need for this functionality seems obvious but then what about someone 
who wants a bunch of random lines from a file? Need we satisfy their wish to 
pick random offsets from the file and get the line in which the offset is in 
middle of or the one about to start? Would that even be random if line lengths 
vary? Text files were never designed to be used efficiently except for reading 
and writing and certainly not for something like sorting.

Again, generally you can read in the darn file and perform the operation and 
free up whatever memory you do  not need. If you have huge files, fine, but 
then why make a special function be part of the default setup if it is rarely 
used? Why not put it in a module/package called BigFileBatches alongside other 
functions useful to do things in batches? Call that when needed but for smaller 
files, KISS.


-Original Message-
From: Dennis Lee Bieber 
To: python-list@python.org
Sent: Wed, May 11, 2022 6:15 pm
Subject: Re: tail

On Thu, 12 May 2022 06:07:18 +1000, Chris Angelico 
declaimed the following:

>I don't understand why this wants to be in the standard library.
>
    Especially as any Linux distribution probably includes the compiled
"tail" command, so this would only be of use on Windows.

    Under recent Windows, one has an equivalent to "tail" IFF using
PowerShell rather than the "DOS" shell.

https://www.middlewareinventory.com/blog/powershell-tail-file-windows-tail-command/

or install a Windows binary equivalent http://tailforwin32.sourceforge.net/


-- 
    Wulfraed                Dennis Lee Bieber        AF6VN
    wlfr...@ix.netcom.com    http://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Avi Gross via Python-list
Just FYI, UNIX had a bunch of utilities that could emulate a vanilla version of 
tail on a command line.
You can use sed, awk and quite a few others to simply show line N to the end of 
a file or other variations. 
Of course the way many things were done back then had less focus on efficiency 
than how to stepwise make changes in a pipeline so reading from the beginning 
to end was not an issue.



-Original Message-
From: Marco Sulla 
To: Chris Angelico 
Cc: python-list@python.org
Sent: Wed, May 11, 2022 5:27 pm
Subject: Re: tail

On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
>
> Have you actually checked those three, or do you merely suppose them to be 
> true?

I only suppose, as I said. I should do some benchmark and some other
tests, and, frankly, I don't want to. I don't want to because I'm
quite sure the implementation is fast, since it reads by chunks and
cache them. I'm not sure it's 100% free of bugs, but the concept is
very simple, since it simply mimics the *nix tail, so it should be
reliable.

>
> > I'd very much like to see a CPython implementation of that function. It
> > could be a method of a file object opened in binary mode, and *only* in
> > binary mode.
> >
> > What do you think about it?
>
> Still not necessary. You can simply have it in your own toolkit. Why
> should it be part of the core language?

Why not?

> How much benefit would it be
> to anyone else?

I suppose that every programmer, at least one time in its life, did a tail.

> All the same assumptions are still there, so it still
> isn't general

It's general. It mimics the *nix tail. I can't think of a more general
way to implement a tail.

> I don't understand why this wants to be in the standard library.

Well, the answer is really simple: I needed it and if I found it in
the stdlib, I used it instead of writing the first horrible function.
Furthermore, tail is such a useful tool that I suppose many others are
interested, based on this quick Google search:

https://www.google.com/search?q=python+tail

A question on Stackoverflow really much voted, many other
Stackoverflow questions, a package that seems to exactly do the same
thing, that is mimic *nix tail, and a blog post about how to tail in
Python. Furthermore, if you search python tail pypi, you can find a
bunch of other packages:

https://www.google.com/search?q=python+tail+pypi

It seems the subject is quite popular, and I can't imagine otherwise.
-- 
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Dennis Lee Bieber
On Thu, 12 May 2022 06:07:18 +1000, Chris Angelico 
declaimed the following:

>I don't understand why this wants to be in the standard library.
>
Especially as any Linux distribution probably includes the compiled
"tail" command, so this would only be of use on Windows.

Under recent Windows, one has an equivalent to "tail" IFF using
PowerShell rather than the "DOS" shell.

https://www.middlewareinventory.com/blog/powershell-tail-file-windows-tail-command/

or install a Windows binary equivalent http://tailforwin32.sourceforge.net/


-- 
Wulfraed Dennis Lee Bieber AF6VN
wlfr...@ix.netcom.comhttp://wlfraed.microdiversity.freeddns.org/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Chris Angelico
On Thu, 12 May 2022 at 07:27, Marco Sulla  wrote:
>
> On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
> >
> > Have you actually checked those three, or do you merely suppose them to be 
> > true?
>
> I only suppose, as I said. I should do some benchmark and some other
> tests, and, frankly, I don't want to. I don't want to because I'm
> quite sure the implementation is fast, since it reads by chunks and
> cache them. I'm not sure it's 100% free of bugs, but the concept is
> very simple, since it simply mimics the *nix tail, so it should be
> reliable.

If you don't care enough to benchmark it or even debug it, why should
anyone else care?

I'm done discussing. You think that someone else should have done this
for you, but you aren't even willing to put in the effort to make this
useful to anyone else. Just use it yourself and have done with it.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Marco Sulla
On Wed, 11 May 2022 at 22:09, Chris Angelico  wrote:
>
> Have you actually checked those three, or do you merely suppose them to be 
> true?

I only suppose, as I said. I should do some benchmark and some other
tests, and, frankly, I don't want to. I don't want to because I'm
quite sure the implementation is fast, since it reads by chunks and
cache them. I'm not sure it's 100% free of bugs, but the concept is
very simple, since it simply mimics the *nix tail, so it should be
reliable.

>
> > I'd very much like to see a CPython implementation of that function. It
> > could be a method of a file object opened in binary mode, and *only* in
> > binary mode.
> >
> > What do you think about it?
>
> Still not necessary. You can simply have it in your own toolkit. Why
> should it be part of the core language?

Why not?

> How much benefit would it be
> to anyone else?

I suppose that every programmer, at least one time in its life, did a tail.

> All the same assumptions are still there, so it still
> isn't general

It's general. It mimics the *nix tail. I can't think of a more general
way to implement a tail.

> I don't understand why this wants to be in the standard library.

Well, the answer is really simple: I needed it and if I found it in
the stdlib, I used it instead of writing the first horrible function.
Furthermore, tail is such a useful tool that I suppose many others are
interested, based on this quick Google search:

https://www.google.com/search?q=python+tail

A question on Stackoverflow really much voted, many other
Stackoverflow questions, a package that seems to exactly do the same
thing, that is mimic *nix tail, and a blog post about how to tail in
Python. Furthermore, if you search python tail pypi, you can find a
bunch of other packages:

https://www.google.com/search?q=python+tail+pypi

It seems the subject is quite popular, and I can't imagine otherwise.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Chris Angelico
On Thu, 12 May 2022 at 06:03, Marco Sulla  wrote:
> I suppose this function is fast. It reads the bytes from the file in chunks
> and stores them in a bytearray, prepending them to it. The final result is
> read from the bytearray and converted to bytes (to be consistent with the
> read method).
>
> I suppose the function is reliable. File is opened in binary mode and only
> b"\n" is searched as line end, as *nix tail (and python readline in binary
> mode) do. And bytes are returned. The caller can use them as is or convert
> them to a string using the encoding it wants, or do whatever its
> imagination can think :)
>
> Finally, it seems to me the function is quite simple.
>
> If all my affirmations are true, the three obstacles written by Chris
> should be passed.

Have you actually checked those three, or do you merely suppose them to be true?

> I'd very much like to see a CPython implementation of that function. It
> could be a method of a file object opened in binary mode, and *only* in
> binary mode.
>
> What do you think about it?

Still not necessary. You can simply have it in your own toolkit. Why
should it be part of the core language? How much benefit would it be
to anyone else? All the same assumptions are still there, so it still
isn't general, and you may as well just *code to your own needs* like
I've been saying all along. This does not need to be in the standard
library. Do what you need, assume what you can safely assume, and
other people can write different code.

I don't understand why this wants to be in the standard library.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: tail

2022-05-11 Thread Marco Sulla
On Mon, 9 May 2022 at 23:15, Dennis Lee Bieber 
wrote:
>
> On Mon, 9 May 2022 21:11:23 +0200, Marco Sulla
>  declaimed the following:
>
> >Nevertheless, tail is a fundamental tool in *nix. It's fast and
> >reliable. Also the tail command can't handle different encodings?
>
> Based upon
> https://github.com/coreutils/coreutils/blob/master/src/tail.c the ONLY
> thing tail looks at is single byte "\n". It does not handle other line
> endings, and appears to performs BINARY I/O, not text I/O. It does nothing
> for bytes that are not "\n". Split multi-byte encodings are irrelevant
> since, if it does not find enough "\n" bytes in the buffer (chunk) it
reads
> another binary chunk and seeks for additional "\n" bytes. Once it finds
the
> desired amount, it is synchronized on the byte following the "\n" (which,
> for multi-byte encodings might be a NUL, but in any event, should be a
safe
> location for subsequent I/O).
>
> Interpretation of encoding appears to fall to the console driver
> configuration when displaying the bytes output by tail.

Ok, I understand. This should be a Python implementation of *nix tail:

import os

_lf = b"\n"
_err_n = "Parameter n must be a positive integer number"
_err_chunk_size = "Parameter chunk_size must be a positive integer number"

def tail(filepath, n=10, chunk_size=100):
if (n <= 0):
raise ValueError(_err_n)

if (n % 1 != 0):
raise ValueError(_err_n)

if (chunk_size <= 0):
raise ValueError(_err_chunk_size)

if (chunk_size % 1 != 0):
raise ValueError(_err_chunk_size)

n_chunk_size = n * chunk_size
pos = os.stat(filepath).st_size
chunk_line_pos = -1
lines_not_found = n

with open(filepath, "rb") as f:
text = bytearray()

while pos != 0:
pos -= n_chunk_size

if pos < 0:
pos = 0

f.seek(pos)
chars = f.read(n_chunk_size)
text[0:0] = chars
search_pos = n_chunk_size

while search_pos != -1:
chunk_line_pos = chars.rfind(_lf, 0, search_pos)

if chunk_line_pos != -1:
lines_not_found -= 1

if lines_not_found == 0:
break

search_pos = chunk_line_pos

if lines_not_found == 0:
break

return bytes(text[chunk_line_pos+1:])

The function opens the file in binary mode and searches only for b"\n". It
returns the last n lines of the file as bytes.

I suppose this function is fast. It reads the bytes from the file in chunks
and stores them in a bytearray, prepending them to it. The final result is
read from the bytearray and converted to bytes (to be consistent with the
read method).

I suppose the function is reliable. File is opened in binary mode and only
b"\n" is searched as line end, as *nix tail (and python readline in binary
mode) do. And bytes are returned. The caller can use them as is or convert
them to a string using the encoding it wants, or do whatever its
imagination can think :)

Finally, it seems to me the function is quite simple.

If all my affirmations are true, the three obstacles written by Chris
should be passed.

I'd very much like to see a CPython implementation of that function. It
could be a method of a file object opened in binary mode, and *only* in
binary mode.

What do you think about it?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-11 Thread anthony.flury via Python-list






Why not do :

  def TempsOneDayDT(date:datetime.date):
 return TempsOneDay(date.year, date.month, date.day)

No repeat of code - just a different interface to the same 
functionality.




-- Original Message --
From: "Michael F. Stemper" 
To: python-list@python.org
Sent: Wednesday, 11 May, 22 At 14:33
Subject: Changing calling sequence
I have a function that I use to retrieve daily data from a
home-brew database. Its calling sequence is;
def TempsOneDay( year, month, date ):
After using it (and its friends) for a few years, I've come to
realize that there are times where it would be advantageous to
invoke it with a datetime.date as its single argument.
As far as I can tell, there are three ways for me to proceed:
1. Write a similar function that takes a single datetime.date
   as its argument.
2. Rewrite the existing function so that it takes a single
   argument, which can be either a tuple of (year,month,date)
   or a datetime.date argument.
3. Rewrite the existing function so that its first argument
   can be either an int (for year) or a datetime.date. The
   existing month and date arguments would be optional, with
   default=None. But, if the first argument is an int, and
   either of month or date is None, an error would be raised.

The first would be the simplest. However, it is obviously WET
rather than DRY.
The second isn't too bad, but a change like this would require that
I find all places that the function is currently used and insert a
pair of parentheses. Touching this much code is risky, as well
as being a bunch of work. (Admittedly, I'd only do it once.)
The third is really klunky, but wouldn't need to touch anything
besides this function.
What are others' thoughts? Which of the approaches above looks
least undesirable (and why)? Can anybody see a fourth approach?
--
Michael F. Stemper
This post contains greater than 95% post-consumer bytes by weight.
--
https://mail.python.org/mailman/listinfo/python-list 



-- Anthony Fluryanthony.fl...@btinternet.com
--
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-11 Thread Grant Edwards
On 2022-05-11, David Raymond  wrote:

> Maybe not the prettiest, but you could also define it like this,
> which also wouldn't require changing of any existing calls or the
> main body of the function past this if block.
>
> def TempsOneDay(*dateComponents):
> if len(dateComponents) == 3:
> year, month, date = dateComponents
> elif len(dateComponents) == 1 and isinstance(dateComponents[0], 
> datetime.date):
> year, month, date = (dateComponents[0].year, dateComponents[0].month, 
> dateComponents[0].day)
> else:
> raise Exception("Error message here")

That would be my preference were I reading the code. It makes it quite
clear that there are two completely separate signatures. I think I
would be a little confused by the 2nd and 3rd values with default
values — the implication would be that I can supply a datetime object
as the first argument and then additional month and date values in the
2nd and 3rd args. You could try to explain it with a comment, but I
tend to ignore comments...
-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Changing calling sequence

2022-05-11 Thread David Raymond
>> I have a function that I use to retrieve daily data from a
>> home-brew database. Its calling sequence is;
>> 
>> def TempsOneDay( year, month, date ):
>> 
>> After using it (and its friends) for a few years, I've come to
>> realize that there are times where it would be advantageous to
>> invoke it with a datetime.date as its single argument.
>
>You could just use all keyword args:
>
>def TempsOneDay(**kwargs):
>
> if 'date' in kwargs:
> handle_datetime(kwargs['date'])
> elif 'year' in kwargs and 'month' in kwargs and 'day' in kwargs:
> handle_args(kwargs['year'], kwargs['month'], kwargs['day'])
> else:
> raise Exception("Bad keyword args")
>
>TempsOneDay(date=datetime.datetime.now)
>
>TempsOneDay(year=2022, month=11, day=30)
>

Maybe not the prettiest, but you could also define it like this, which also 
wouldn't require changing of any existing calls or the main body of the 
function past this if block.

def TempsOneDay(*dateComponents):
if len(dateComponents) == 3:
year, month, date = dateComponents
elif len(dateComponents) == 1 and isinstance(dateComponents[0], 
datetime.date):
year, month, date = (dateComponents[0].year, dateComponents[0].month, 
dateComponents[0].day)
else:
raise Exception("Error message here")
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-11 Thread 2QdxY4RzWzUUiLuE
On 2022-05-11 at 08:33:27 -0500,
"Michael F. Stemper"  wrote:

> I have a function that I use to retrieve daily data from a
> home-brew database. Its calling sequence is;
> 
> def TempsOneDay( year, month, date ):
> 
> After using it (and its friends) for a few years, I've come to
> realize that there are times where it would be advantageous to
> invoke it with a datetime.date as its single argument.
> 
> As far as I can tell, there are three ways for me to proceed:
> 1. Write a similar function that takes a single datetime.date
>as its argument.
> 2. Rewrite the existing function so that it takes a single
>argument, which can be either a tuple of (year,month,date)
>or a datetime.date argument.
> 3. Rewrite the existing function so that its first argument
>can be either an int (for year) or a datetime.date. The
>existing month and date arguments would be optional, with
>default=None. But, if the first argument is an int, and
>either of month or date is None, an error would be raised.
> 
> The first would be the simplest. However, it is obviously WET
> rather than DRY.

It's also the least disruptive to existing code and tests, and the most
clear to readers (whether or not they're familiar with said existing
code).

What pieces, exactly, do you think you would repeat, especially after
you extract the common logic into a new function that should be simpler
than either API-level function.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Changing calling sequence

2022-05-11 Thread Tobiah

On 5/11/22 06:33, Michael F. Stemper wrote:

I have a function that I use to retrieve daily data from a
home-brew database. Its calling sequence is;

def TempsOneDay( year, month, date ):

After using it (and its friends) for a few years, I've come to
realize that there are times where it would be advantageous to
invoke it with a datetime.date as its single argument.


You could just use all keyword args:

def TempsOneDay(**kwargs):

if 'date' in kwargs:
handle_datetime(kwargs['date'])
elif 'year' in kwargs and 'month' in kwargs and 'day' in kwargs:
handle_args(kwargs['year'], kwargs['month'], kwargs['day'])
else:
raise Exception("Bad keyword args")

TempsOneDay(date=datetime.datetime.now)

TempsOneDay(year=2022, month=11, day=30)

--
https://mail.python.org/mailman/listinfo/python-list


Changing calling sequence

2022-05-11 Thread Michael F. Stemper

I have a function that I use to retrieve daily data from a
home-brew database. Its calling sequence is;

def TempsOneDay( year, month, date ):

After using it (and its friends) for a few years, I've come to
realize that there are times where it would be advantageous to
invoke it with a datetime.date as its single argument.

As far as I can tell, there are three ways for me to proceed:
1. Write a similar function that takes a single datetime.date
   as its argument.
2. Rewrite the existing function so that it takes a single
   argument, which can be either a tuple of (year,month,date)
   or a datetime.date argument.
3. Rewrite the existing function so that its first argument
   can be either an int (for year) or a datetime.date. The
   existing month and date arguments would be optional, with
   default=None. But, if the first argument is an int, and
   either of month or date is None, an error would be raised.

The first would be the simplest. However, it is obviously WET
rather than DRY.

The second isn't too bad, but a change like this would require that
I find all places that the function is currently used and insert a
pair of parentheses. Touching this much code is risky, as well
as being a bunch of work. (Admittedly, I'd only do it once.)

The third is really klunky, but wouldn't need to touch anything
besides this function.

What are others' thoughts? Which of the approaches above looks
least undesirable (and why)? Can anybody see a fourth approach?

--
Michael F. Stemper
This post contains greater than 95% post-consumer bytes by weight.
--
https://mail.python.org/mailman/listinfo/python-list