Re: Python 2.7 range Function provokes a Memory Error

2023-03-06 Thread Chris Angelico
On Tue, 7 Mar 2023 at 16:53, Stephen Tucker  wrote:
>
> Hi again,
>
> I tried xrange, but I got an error telling me that my integer was too big
> for a C long.
>
> Clearly, xrange in Py2 is not capable of dealing with Python (that is,
> possibly very long) integers.

That's because Py2 has two different integer types, int and long.

> I am raising this because,
>
> (a) IF xrange in Py3 is a simple "port" from Py2, then it won't handle
> Python integers either.
>
> AND
>
> (b) IF xrange in Py3 is intended to be equivalent to range (which, even in
> Py2, does handle Python integers)
>
> THEN
>
> It could be argued that xrange in Py3 needs some attention from the
> developer(s).


Why don't you actually try Python 3 instead of making assumptions
based on the state of Python from more than a decade ago?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python 2.7 range Function provokes a Memory Error

2023-03-06 Thread Stephen Tucker
Hi again,

I tried xrange, but I got an error telling me that my integer was too big
for a C long.

Clearly, xrange in Py2 is not capable of dealing with Python (that is,
possibly very long) integers.

I am raising this because,

(a) IF xrange in Py3 is a simple "port" from Py2, then it won't handle
Python integers either.

AND

(b) IF xrange in Py3 is intended to be equivalent to range (which, even in
Py2, does handle Python integers)

THEN

It could be argued that xrange in Py3 needs some attention from the
developer(s).

Stephen Tucker.


On Thu, Mar 2, 2023 at 6:24 PM Jon Ribbens via Python-list <
python-list@python.org> wrote:

> On 2023-03-02, Stephen Tucker  wrote:
> > The range function in Python 2.7 (and yes, I know that it is now
> > superseded), provokes a Memory Error when asked to deiliver a very long
> > list of values.
> >
> > I assume that this is because the function produces a list which it then
> > iterates through.
> >
> > 1. Does the  range  function in Python 3.x behave the same way?
>
> No, in Python 3 it is an iterator which produces the next number in the
> sequence each time.
>
> > 2. Is there any equivalent way that behaves more like a  for loop (that
> is,
> > without producing a list)?
>
> Yes, 'xrange' in Python 2 behaves like 'range' in Python 3.
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


[Python-announce] SCons 4.5.1 Released

2023-03-06 Thread Bill Deegan
A new SCons release, 4.5.1, is now available on the SCons download page:

https://scons.org/pages/download.html

Here is a summary of the changes since 4.5.0:

FIXES
-

- Fix a problem in 4.5.0 where using something like the following code
  will cause a Clone()'d environment to share the CPPDEFINES with the
  original Environment() which was cloned. Causing leakage of changes
  to CPPDEFINES when they should be completely independent after the Clone.
  env=Environment(CPPDEFINES=['a'])
  env.Append(CPPDEFINES=['b']) (or AppendUnique,Prepend,PrependUnique)
  env1=env.Clone()
  env1.Append(CPPDEFINES=['c']) (or any other modification, but not
overwriting CPPDEFINES)
  Now env['CPPDEFINES'] will contain 'c' when it should not.


Thanks to the following contributors listed below for their contributions
to this release.
==
.. code-block:: text

git shortlog --no-merges -ns 4.5.0..HEAD

 3  William Deegan
 1  Mats Wichmann
___
Python-announce-list mailing list -- python-announce-list@python.org
To unsubscribe send an email to python-announce-list-le...@python.org
https://mail.python.org/mailman3/lists/python-announce-list.python.org/
Member address: arch...@mail-archive.com


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:

> ne issue that was also correctly foreseen by some is that there's going
> to be a new request at every user key stroke. Known problem. JavaScript
> programmers use a trick called "debounceing" to be reasonably sure that
> the user is done typing before a request is issued:
> 
> https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be 

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm 
only dealing with a city or county so the data to be searched isn't huge. 
The maps.google.com address search covers the world and they're also 
throwing in a geographical constraint so the suggestions are applicable to 
the area you're viewing.  It must be nice to have a server or two...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


Gentlemen, thanks a ton to everyone who offered to help (and did help!). 
I loved the part where some tried to divine the true meaning of my words :)


What you guys wrote is correct: the grep-esque search is guaranteed to 
turn up a ton of false positives, but for the autofill use-case, that's 
actually OK. Users will quickly figure what is not relevant and skip 
those entries, just to zero on in on the suggestion that they find relevant.


One issue that was also correctly foreseen by some is that there's going 
to be a new request at every user key stroke. Known problem. JavaScript 
programmers use a trick called "debounceing" to be reasonably sure that 
the user is done typing before a request is issued:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

I was able to apply that successfully and I am now very pleased with the 
final result.


Apologies if I posted 1400 lines or data file. Seeing that certain 
newsgroups carry gigabytes of copyright infringing material must have 
conveyed the wrong impression to me.


Thank you.

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Greg Ewing via Python-list

On 7/03/23 6:49 am, avi.e.gr...@gmail.com wrote:

But the example given wanted to match something like "V6" in middle of the text 
and I do not see how that would work as you would now need to search 26 dictionaries 
completely.


It might even make things worse, as there is likely to be a lot of
overlap between entries containing "V" and entries containing "6",
so you end up searching the same data multiple times.

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Greg Ewing via Python-list

On 7/03/23 4:35 am, Weatherby,Gerard wrote:

If mailing space is a consideration, we could all help by keeping our replies 
short and to the point.


Indeed. A thread or two of untrimmed quoted messages is probably
more data than Dino posted!

--
Greg

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 12:49 PM, avi.e.gr...@gmail.com wrote:

Thomas,

I may have missed any discussion where the OP explained more about proposed 
usage. If the program is designed to load the full data once, never get updates 
except by re-reading some file, and then handles multiple requests, then some 
things may be worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string anywhere 
in the text so a grep-like solution is a reasonable start with the actual data being 
stored as something like a list of character strings you can search "one line" 
at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all 
previous searches using a cache decorator. That generally uses a dictionary for 
the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching for text anchored to the start of a line 
so if you ask for "Honda" you instead ask the dictionary called "h" and search perhaps just for 
"onda" then recombine the prefix in any results. But the example given wanted to match something like 
"V6" in middle of the text and I do not see how that would work as you would now need to search 26 
dictionaries completely.


Well, that's the question, isn't it?  Just how is this expected to be 
used?  I didn't read the initial posting that carefully, and I may have 
missed something that makes a difference.


The OP gives as an example a user entering a string ("v60").  The 
example is for a model designation.  If we know that this entry box will 
only receive model, then I would populate a dictionary using the model 
numbers as keys.  The number of distinct keys will probably not be that 
large.


For example, highly simplified of course:

>>> models = {'v60': 'Volvo', 'GV60': 'Genesis', 'cl': 'Acura'}
>>> entry = '60'
>>> candidates = (m for m in models.keys() if entry in m)
>>> list(candidates)
['v60', 'GV60']

The keys would be lower-cased.  A separate dictionary would give the 
complete string with the desired casing.  The values could be object 
references to the complete information.  If there might be several 
different models models with the same key, then the values could be 
lists or dictionaries and one would need to do some disambiguation, but 
that should be simple or quick.


It all depends on the planned access patterns.  If the OP really wants 
full-text search in the complete unstructured data file, then yes, a 
full text indexer of some kind will be useful.  Whoosh certainly looks 
good though I have not used it.  But for populating dropdown lists in 
web forms, most likely the design of the form will provide a structure 
for the various searches.



-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


Chances are it would only be seconds at most to build the data cache,
and then subsequent queries would respond very quickly.



From: Python-list  on behalf of 
Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$




--
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Ah, thanks Dino. Autocomplete within a web page can be an interesting
scenario but also a daunting one.

Now, do you mean you have a web page with a text field, initially I suppose
empty, and the user types a single character and rapidly a drop-down list or
something is created and shown? And as they type, it may shrink? And as soon
as they select one, it is replaced in the text field and done?

If your form has an attached function written in JavaScript, some might load
your data into the browser and do all that work from within. No python
needed.

Now if your scenario is similar to the above, or perhaps the user needs to
ask for autocompletion by using tab or something, and you want to keep
sending requests to a server, you can of course use any language on the
server. BUT I would be cautious in such a design.

My guess is you autocomplete on every keystroke and the user may well type
multiple characters resulting in multiple requests for your program. Is a
new one called every time or is it a running service. If the latter, it pays
to read in the data once and then carefully serve it. But when you get just
the letter "h" you may not want to send and process a thousand results but
limit It to say the first N. If they then add an o to make a ho, You may not
need to do much if it is anchored to the start except to search in the
results of the previous search rather than the whole data.

But have you done some searching on how autocomplete from a fixed corpus is
normally done? It is a quite common thing.


-Original Message-
From: Python-list  On
Behalf Of Dino
Sent: Monday, March 6, 2023 7:40 AM
To: python-list@python.org
Subject: Re: RE: Fast full-text searching in Python (job for Whoosh?)

Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.

What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.

Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:

https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.

Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:
> Dino, Sending lots of data to an archived forum is not a great idea. I
> snipped most of it out below as not to replicate it.
> 
> Your question does not look difficult unless your real question is about
> speed. Realistically, much of the time spent generally is in reading in a
> file and the actual search can be quite rapid with a wide range of
methods.
> 
> The data looks boring enough and seems to not have much structure other
than
> one comma possibly separating two fields. Do you want the data as one wide
> filed or perhaps in two parts, which a CSV file is normally used to
> represent. Do you ever have questions like tell me all cars whose name
> begins with the letter D and has a V6 engine? If so, you may want more
than
> a vanilla search.
> 
> What exactly do you want to search for? Is it a set of built-in searches
or
> something the user types in?
> 
> The data seems to be sorted by the first field and then by the second and
I
> did not check if some searches might be ambiguous. Can there be many
entries
> containing III? Yep. Can the same words like Cruiser or Hybrid appear?
> 
> So is this a one-time search or multiple searches once loaded as in a
> service that stays resident and fields requests. The latter may be worth
> speeding up.
> 
> I don't NEED to know any of this but want you to know that the answer may
> depend on this and similar factors. We had a long discussion lately on
> whether to search using regular expressions or string methods. If your
data
> is meant to be used once, you may not even need to read the file into
> memory, but read something like a line at a time and test it. Or, if you
end
> up with more data like how many cylinders a car has, it may be time to
read
> it in not just to a list of lines or such data structures, but get
> numpy/pandas involved and use their many search methods in something like
a
> data.frame.
> 
> Of course if you are worried about portability, keep using Get Regular
> Expression Print.
> 
> Your example was:
> 
>   $ grep -i v60 all_cars_unique.csv
>   Genesis,GV60
>   Volvo,V60
> 
> You seem to have wanted case folding and that is NOT a normal search. And
> your search is matching 

Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-06 Thread Dieter Maurer
aapost wrote at 2023-3-5 09:35 -0500:
> ...
>If a file is still open, even if all the operations on the file have
>ceased for a time, the tail of the written operation data does not get
>flushed to the file until close is issued and the file closes cleanly.

This is normal: the buffer is flushed if one of the following conditions
are met:
1. you call `flush`
2. the buffer overflows
3. the file is closed.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 7:28 AM, Dino wrote:

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


I know nothing about it, sorry.  But anything beyond python dictionaries 
and possibly some lists strikes me as overkill for what you have described.



IOW, do the bulk of the work once at startup.


Sound advice

Thank you


--
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Thomas,

I may have missed any discussion where the OP explained more about proposed 
usage. If the program is designed to load the full data once, never get updates 
except by re-reading some file, and then handles multiple requests, then some 
things may be worth doing.

It looked to me, and I may well be wrong, like he wanted to search for a string 
anywhere in the text so a grep-like solution is a reasonable start with the 
actual data being stored as something like a list of character strings you can 
search "one line" at a time. I suspect a numpy variant may work faster.

And of course any search function he builds can be made to remember some or all 
previous searches using a cache decorator. That generally uses a dictionary for 
the search keys internally.

But using lots of dictionaries strikes me as only helping if you are searching 
for text anchored to the start of a line so if you ask for "Honda" you instead 
ask the dictionary called "h" and search perhaps just for "onda" then recombine 
the prefix in any results. But the example given wanted to match something like 
"V6" in middle of the text and I do not see how that would work as you would 
now need to search 26 dictionaries completely.



-Original Message-
From: Python-list  On 
Behalf Of Thomas Passin
Sent: Monday, March 6, 2023 11:03 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:
> Not sure if this is what Thomas meant, but I was also thinking dictionaries.
> 
> Dino could build a set of dictionaries with keys “a” through “z” that contain 
> data with those letters in them. (I’m assuming case insensitive search) and 
> then just search “v” if that’s what the user starts with.
> 
> Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
> “zz. And so on.
> 
> Of course, it’s trading CPU for memory usage, and there’s likely a point at 
> which the cost of building dictionaries exceeds the savings in searching.

Chances are it would only be seconds at most to build the data cache, 
and then subsequent queries would respond very quickly.

> 
> From: Python-list  on 
> behalf of Thomas Passin 
> Date: Sunday, March 5, 2023 at 9:07 PM
> To: python-list@python.org 
> Subject: Re: Fast full-text searching in Python (job for Whoosh?)
> 
> I would probably ingest the data at startup into a dictionary - or
> perhaps several depending on your access patterns - and then you will
> only need to to a fast lookup in one or more dictionaries.
> 
> If your access pattern would be easier with SQL queries, load the data
> into an SQLite database on startup.
> 
> IOW, do the bulk of the work once at startup.
> --
> https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread avi.e.gross
Gerard,

I was politely pointing out how it was more than the minimum necessary and
might gets repeated multiple times as people replied. The storage space is a
resource someone else provides and I prefer not abusing it.

However, since the OP seems to be asking a question focused on how long it
takes to search using possible techniques, indeed some people would want the
entire data to test with.

In my personal view, the a snippet of the data is what I need to see how it
is organized and then what I need way more is some idea for what kind of
searching is needed.

If I was told there would be a web page allowing users to search a web
service hosting the data on a server with one process called as much as
needed that spawned threads to handle the task, I might see it as very
worthwhile to read in the data once into some data structure that allows
rapid searches over and over.  If it is an app called ONCE as a whole for
each result, as in the grep example, why bother and just read a line at a
time and be done with it.

My suggestion remains my preference. The discussion is archived. Messages
are can optimally be trimmed as needed and not allowed to contain the full
contents of the last twenty replies back and forth unless that is needed.
Larger amounts of data can be offered to share and if wanted, can be posted
or send to someone asking for it or placed in some public accessible place.

But my preference may not be relevant as the forum has hosts or owners and
it is what they want that counts.

The data this time was not really gigantic. But I often work with data from
a CSV that has hundreds of columns and hundreds of thousands or more rows,
with some of the columns containing large amounts of text. But I may be
interested in how to work with say just half a dozen columns and for the
purposes of my question here, perhaps a hundred representative rows. Should
I share everything, or maybe save the subset and only share that?

This is not about python as a language but about expressing ideas and
opinions on a public forum with limited resources. Yes, over the years, my
combined posts probably use far more archival space. We are not asked to be
sparse, just not be wasteful. 

The OP may consider what he is working with as a LOT of data but it really
isn't by modern standards. 

-Original Message-
From: Python-list  On
Behalf Of Weatherby,Gerard
Sent: Monday, March 6, 2023 10:35 AM
To: python-list@python.org
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

"Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it."

Surely in 2023, storage is affordable enough there's no need to criticize
Dino for posting complete information. If mailing space is a consideration,
we could all help by keeping our replies short and to the point.

-- 
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


IOW, do the bulk of the work once at startup.


Sound advice

Thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: Cutting slices

2023-03-06 Thread Christian Gollwitzer

Am 05.03.23 um 23:43 schrieb Stefan Ram:

   The following behaviour of Python strikes me as being a bit
   "irregular". A user tries to chop of sections from a string,
   but does not use "split" because the separator might become
   more complicated so that a regular expression will be required
   to find it. 


OK, so if you want to use an RE for splitting, can you not use 
re.split() ? It basically works like the built-in splitting in AWK


>>> s='alphaAbetaBgamma'
>>> import re
>>> re.split(r'A|B|C', s)
['alpha', 'beta', 'gamma']
>>>


Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 07:40:29 -0500, Dino wrote:

> The idea that someone types into an input field and matches start
> dancing in the browser made me think that this was exactly what I
> needed, and hence I figured that asking here about Whoosh would be a
> good idea. I know realize that Whoosh would be overkill for my use-case,
> as a simple (case insensitive) query substring would get me 90% of what
> I want. Speed is in the order of a few milliseconds out of the box,
> which is chump change in the context of a web UI.

For a web application the round trips to the server for the next set of 
suggestions swamp out the actual lookups. Use the developer console in 
your browser to look at the network traffic and you'll see it's busy.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread rbowman
On Mon, 6 Mar 2023 15:32:09 +, Weatherby,Gerard wrote:


> Increased performance may be achieved by building dictionaries “aa”,”ab”
> ... “zz. And so on.

Or a trie. There have been several implementations but I believe this is 
the most active:

https://pypi.org/project/PyTrie/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-06 Thread aapost

On 3/5/23 19:02, Cameron Simpson wrote:

On 05Mar2023 10:38, aapost  wrote:

Additionally (not sure if this still applies):
flush() does not necessarily write the file’s data to disk. Use 
flush() followed by os.fsync() to ensure this behavior.


Yes. You almost _never_ need or want this behaviour. A database tends to 
fsync at the end of a transaction and at other critical points.


However, once you've `flush()`ed the file the data are then in the hands 
of the OS, to get to disc in a timely but efficient fashion. Calling 
fsync(), like calling flush(), affects writing _efficiency_ by depriving 
the OS (or for flush(), the Python I/O buffering system) the opportunity 
to bundle further data efficiency. It will degrade the overall performance.


Also, fsync() need not expedite the data getting to disc. It is equally 
valid that it just blocks your programme _until_ the data have gone to 
disc. I practice it probably does expedite things slightly, but the real 
world effect is that your pogramme will gratuitously block anyway, when 
it could just get on with its work, secure in the knowledge that the OS 
has its back.


flush() is for causality - ensuring the data are on their way so that 
some external party _will_ see them rather than waiting forever for data 
with are lurking in the buffer.  If that external party, for you, is an 
end user tailing a log file, then you might want to flush(0 at the end 
of every line.  Note that there is a presupplied line-buffering mode you 
can choose which will cause a file to flush like that for you 
automatically.


So when you flush is a policy decision which you can make either during 
the programme flow or to a less flexible degree when you open the file.


As an example of choosing-to-flush, here's a little bit of code in a 
module I use for writing packet data to a stream (eg a TCP connection):

https://github.com/cameron-simpson/css/blob/00ab1a8a64453dc8a39578b901cfa8d1c75c3de2/lib/python/cs/packetstream.py#L624

Starting at line 640: `if Q.empty():` it optionally pauses briefly to 
see if more packets are coming on the source queue. If another arrives, 
the flush() is _skipped_, and the decision to flush made again after the 
next packet is transcribed. In this way a busy source of packets can 
write maximally efficient data (full buffers) as long as there's new 
data coming from the queue, but if the queue is empty and stays empty 
for more that `grace` seconds we flush anyway so that the receiver 
_will_ still see the latest packet.


Cheers,
Cameron Simpson 


Thanks for the details. And yes, that above quote was from a 
non-official doc without a version reference that several forum posts 
were referencing, with no further reasoning as to why they make the 
suggestion or to what importance it was (for the uninformed trying to 
parse it, the suggestion could be because of anything, like python 
lacking something that maybe was fixed, or who knows.) Thanks.



--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino
Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.


What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.


Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:


https://www.youtube.com/watch?v=gRvZbYtwTeo_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.


Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

  $ grep -i v60 all_cars_unique.csv
  Genesis,GV60
  Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...



--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-06 Thread aapost

On 3/5/23 09:35, aapost wrote:
I have run in to this a few times and finally reproduced it. Whether it 
is as expected I am not sure since it is slightly on the user, but I can 
think of scenarios where this would be undesirable behavior.. This 
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the 
reasoning lingers somewhere else.


If a file is still open, even if all the operations on the file have 
ceased for a time, the tail of the written operation data does not get 
flushed to the file until close is issued and the file closes cleanly.


2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(5):
   f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(5):
   f.write(str(i) + "\n")
while(1):
   pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes 
the files cleanly, but if the file exits uncleanly, i.e. a kill command 
or something else catastrophic. the remaining buffer is lost.


Of course one SHOULD manage the closing of their files and this is 
partially on the user, but if by design something is hanging on to a 
file while it is waiting for something, then a crash occurs, they lose a 
portion of what was assumed already complete...


>Cameron
>Eryk

Yeah, I later noticed open() has the buffering option in the docs, and 
the warning on a subsequent page:


Warning
Calling f.write() without using the with keyword or calling f.close() 
might result in the arguments of f.write() not being completely written 
to the disk, even if the program exits successfully.


I will have to set the buffer arg to 1. I just hadn't thought about 
buffering in quite a while since python just handles most of the things 
lower level languages don't. I guess my (of course incorrect) 
assumptions would have leaned toward some sort of auto handling of the 
flush, or a non-buffer default (not saying it should).


And I understand why it is the way it is from a developer standpoint, 
it's sort of a mental thing in the moment, I was in a sysadmin way of 
thinking, switching around from doing things in bash with multiple 
terminals, forgetting the fundamentals of what the python interpreter is 
vs a sequence of terminal commands.


That being said, while "with" is great for many use cases, I think its 
overuse causes concepts like flush and the underlying "whys" to atrophy 
(especially since it is obviously a concept that is still important). It 
also doesn't work well when doing quick and dirty work in the 
interpreter to build a file on the fly with a sequence of commands you 
haven't completely thought through yet, in addition to the not wanting 
to close yet, the subsequent indention requirement is annoying. f = 
open("fn", "w", 1) will be the go to for that type of work since now I 
know. Again, just nitpicking, lol.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 1:19 AM, Greg Ewing wrote:

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.


thank you, Greg. That's what I am going to do in fact.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-06 Thread Weatherby,Gerard
Add f.reconfigure it you want line buffering in your example:

f = open("abc", "w")
f.reconfigure(line_buffering=True)
for i in range(5):
   f.write(str(i) + "\n")

More Pythonic would be:

with open("abc", "w") as f:
   for i in range(5000):
  print(i,file=f)

From: Python-list  on 
behalf of aapost 
Date: Sunday, March 5, 2023 at 6:33 PM
To: python-list@python.org 
Subject: Bug 3.11.x behavioral, open file buffers not flushed til file closed.
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

I have run in to this a few times and finally reproduced it. Whether it
is as expected I am not sure since it is slightly on the user, but I can
think of scenarios where this would be undesirable behavior.. This
occurs on 3.11.1 and 3.11.2 using debian 12 testing, in case the
reasoning lingers somewhere else.

If a file is still open, even if all the operations on the file have
ceased for a time, the tail of the written operation data does not get
flushed to the file until close is issued and the file closes cleanly.

2 methods to recreate - 1st run from interpreter directly:

f = open("abc", "w")
for i in range(5):
   f.write(str(i) + "\n")

you can cat the file and see it stops at 49626 until you issue an f.close()

a script to recreate:

f = open("abc", "w")
for i in range(5):
   f.write(str(i) + "\n")
while(1):
   pass

cat out the file and same thing, stops at 49626. a ctrl-c exit closes
the files cleanly, but if the file exits uncleanly, i.e. a kill command
or something else catastrophic. the remaining buffer is lost.

Of course one SHOULD manage the closing of their files and this is
partially on the user, but if by design something is hanging on to a
file while it is waiting for something, then a crash occurs, they lose a
portion of what was assumed already complete...
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!kBYMol9JmMVwZD0iSoeeR1fYTiX8DEG-V4LBm4aAw4IJQ6Am4Ql_HYRZOeO8XK3kZvq2_adnid-FeoHr37Tw2I7k$
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Thomas Passin

On 3/6/2023 10:32 AM, Weatherby,Gerard wrote:

Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


Chances are it would only be seconds at most to build the data cache, 
and then subsequent queries would respond very quickly.




From: Python-list  on behalf of 
Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Weatherby,Gerard
“Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.”

Surely in 2023, storage is affordable enough there’s no need to criticize Dino 
for posting complete information. If mailing space is a consideration, we could 
all help by keeping our replies short and to the point.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Weatherby,Gerard
Not sure if this is what Thomas meant, but I was also thinking dictionaries.

Dino could build a set of dictionaries with keys “a” through “z” that contain 
data with those letters in them. (I’m assuming case insensitive search) and 
then just search “v” if that’s what the user starts with.

Increased performance may be achieved by building dictionaries “aa”,”ab” ... 
“zz. And so on.

Of course, it’s trading CPU for memory usage, and there’s likely a point at 
which the cost of building dictionaries exceeds the savings in searching.


From: Python-list  on 
behalf of Thomas Passin 
Date: Sunday, March 5, 2023 at 9:07 PM
To: python-list@python.org 
Subject: Re: Fast full-text searching in Python (job for Whoosh?)

I would probably ingest the data at startup into a dictionary - or
perhaps several depending on your access patterns - and then you will
only need to to a fast lookup in one or more dictionaries.

If your access pattern would be easier with SQL queries, load the data
into an SQLite database on startup.

IOW, do the bulk of the work once at startup.
--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!lnP5Hxid5mAgwg8o141SvmHPgCBU8zEaHDgukrQm2igozg5H5XLoIkAmrsHtRbZHR68oYAQpRFPh-Z9telM$
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Bug 3.11.x behavioral, open file buffers not flushed til file closed.

2023-03-06 Thread Barry


> On 6 Mar 2023, at 01:42, Greg Ewing via Python-list  
> wrote:
> 
> On 6/03/23 1:02 pm, Cameron Simpson wrote:
>> Also, fsync() need not expedite the data getting to disc. It is equally 
>> valid that it just blocks your programme _until_ the data have gone to disc.
> 
> Or until it *thinks* the data has gone to the disk. Some drives
> do buffering of their own, which may impose additional delays
> before the data actually gets written.

This used to be an issue until Microsoft refused to certify and drive that lied 
about when data was persisted to the medium. WHQL?

That had the effect of stooping driver manufactures having firmware to win 
benchmarking.

Now the OS will use the commands to the drive that allow the OS to know the 
data is safe.

Barry

> 
> -- 
> Greg
> -- 
> https://mail.python.org/mailman/listinfo/python-list
> 

-- 
https://mail.python.org/mailman/listinfo/python-list