Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 2:02 PM, avi.e.gr...@gmail.com wrote:

Some of the discussions here leave me confused as the info we think we got
early does not last long intact and often morphs into something else and we
find much of the discussion is misdirected or wasted.



Apologies. I'm the OP and also the OS (original sinner). My "mistake" 
was to go for a "stream of consciousness" kind of question, rather than 
a well researched and thought out one.


You are correct, Avi. I have a simple web UI, I came across the Whoosh 
video and got infatuated with the idea that Whoosh could be used for 
create a autofill function, as my backend is already Python/Flask. As 
many have observed and as I have also quickly realized, Whoosh was 
overkill for my use case. In the meantime people started asking 
questions, I responded and, before you know it, we are all discussing 
the intricacies of JavaScript web development in a Python forum. Should 
I have stopped them? How?


One thing is for sure: I am really grateful that so many used so much of 
their time to help.


A big thank you to each of you, friends.

Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-08 Thread Dino

On 3/7/2023 1:28 PM, David Lowry-Duda wrote:

But I'll note that I use whoosh from time to time and I find it stable 
and pleasant to work with. It's true that development stopped, but it 
stopped in a very stable place. I don't recommend using whoosh here, but 
I would recommend experimenting with it more generally.


Thank you, David. Noted.


--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-07 Thread Dino

On 3/6/2023 11:05 PM, rbowman wrote:


It must be nice to have a server or two...


No kidding

About everything else you wrote, it makes a ton of sense, in fact it's a 
dilemma I am facing now. My back-end returns 10 entries (I am limiting 
to max 10 matches server side for reasons you can imagine).
As the user keeps typing, should I restrict the existing result set 
based on the new information or re-issue a API call to the server?
Things get confusing pretty fast for the user. You don't want too many 
cooks in kitchen, I guess.
Played a little bit with both approaches in my little application. 
Re-requesting from the server seems to win hands down in my case.
I am sure that them google engineers reached spectacular levels of UI 
finesse with stuff like this.



On Mon, 6 Mar 2023 21:55:37 -0500, Dino wrote:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript


That could be annoying. My use case is address entry. When the user types

102 ma

the suggestions might be

main
manson
maple
massachusetts
masten

in a simple case. When they enter 's' it's narrowed down. Typically I'm
only dealing with a city or county so the data to be searched isn't huge.
The maps.google.com address search covers the world and they're also
throwing in a geographical constraint so the suggestions are applicable to
the area you're viewing.  



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


Gentlemen, thanks a ton to everyone who offered to help (and did help!). 
I loved the part where some tried to divine the true meaning of my words :)


What you guys wrote is correct: the grep-esque search is guaranteed to 
turn up a ton of false positives, but for the autofill use-case, that's 
actually OK. Users will quickly figure what is not relevant and skip 
those entries, just to zero on in on the suggestion that they find relevant.


One issue that was also correctly foreseen by some is that there's going 
to be a new request at every user key stroke. Known problem. JavaScript 
programmers use a trick called "debounceing" to be reasonably sure that 
the user is done typing before a request is issued:


https://schier.co/blog/wait-for-user-to-stop-typing-using-javascript

I was able to apply that successfully and I am now very pleased with the 
final result.


Apologies if I posted 1400 lines or data file. Seeing that certain 
newsgroups carry gigabytes of copyright infringing material must have 
conveyed the wrong impression to me.


Thank you.

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 9:05 PM, Thomas Passin wrote:


I would probably ingest the data at startup into a dictionary - or 
perhaps several depending on your access patterns - and then you will 
only need to to a fast lookup in one or more dictionaries.


If your access pattern would be easier with SQL queries, load the data 
into an SQLite database on startup.


Thank you. SQLite would be overkill here, plus all the machinery that I 
would need to set up to make sure that the DB is rebuilt/updated regularly.

Do you happen to know something about Whoosh? have you ever used it?


IOW, do the bulk of the work once at startup.


Sound advice

Thank you
--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino
Thank you for taking the time to write such a detailed answer, Avi. And 
apologies for not providing more info from the get go.


What I am trying to achieve here is supporting autocomplete (no pun 
intended) in a web form field, hence the -i case insensitive example in 
my initial question.


Your points are all good, and my original question was a bit rushed. I 
guess that the problem was that I saw this video:


https://www.youtube.com/watch?v=gRvZbYtwTeo&ab_channel=NextDayVideo

The idea that someone types into an input field and matches start 
dancing in the browser made me think that this was exactly what I 
needed, and hence I figured that asking here about Whoosh would be a 
good idea. I know realize that Whoosh would be overkill for my use-case, 
as a simple (case insensitive) query substring would get me 90% of what 
I want. Speed is in the order of a few milliseconds out of the box, 
which is chump change in the context of a web UI.


Thank you again for taking the time to look at my question

Dino

On 3/5/2023 10:56 PM, avi.e.gr...@gmail.com wrote:

Dino, Sending lots of data to an archived forum is not a great idea. I
snipped most of it out below as not to replicate it.

Your question does not look difficult unless your real question is about
speed. Realistically, much of the time spent generally is in reading in a
file and the actual search can be quite rapid with a wide range of methods.

The data looks boring enough and seems to not have much structure other than
one comma possibly separating two fields. Do you want the data as one wide
filed or perhaps in two parts, which a CSV file is normally used to
represent. Do you ever have questions like tell me all cars whose name
begins with the letter D and has a V6 engine? If so, you may want more than
a vanilla search.

What exactly do you want to search for? Is it a set of built-in searches or
something the user types in?

The data seems to be sorted by the first field and then by the second and I
did not check if some searches might be ambiguous. Can there be many entries
containing III? Yep. Can the same words like Cruiser or Hybrid appear?

So is this a one-time search or multiple searches once loaded as in a
service that stays resident and fields requests. The latter may be worth
speeding up.

I don't NEED to know any of this but want you to know that the answer may
depend on this and similar factors. We had a long discussion lately on
whether to search using regular expressions or string methods. If your data
is meant to be used once, you may not even need to read the file into
memory, but read something like a line at a time and test it. Or, if you end
up with more data like how many cylinders a car has, it may be time to read
it in not just to a list of lines or such data structures, but get
numpy/pandas involved and use their many search methods in something like a
data.frame.

Of course if you are worried about portability, keep using Get Regular
Expression Print.

Your example was:

  $ grep -i v60 all_cars_unique.csv
  Genesis,GV60
  Volvo,V60

You seem to have wanted case folding and that is NOT a normal search. And
your search is matching anything on any line. If you wanted only a complete
field, such as all text after a comma to the end of the line, you could use
grep specifications to say that.

But once inside python, you would need to make choices depending on what
kind of searches you want to allow but also things like do you want all
matching lines shown if you search for say "a" ...



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-06 Thread Dino

On 3/5/2023 1:19 AM, Greg Ewing wrote:

I just did a similar test with your actual data and got
about the same result. If that's fast enough for you,
then you don't need to do anything fancy.


thank you, Greg. That's what I am going to do in fact.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



Here's the complete data file should anyone care.

Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
Acura,RLX Sport Hybrid
Acura,RSX
Acura,SLX
Acura,TL
Acura,TLX
Acura,TSX
Acura,Vigor
Acura,ZDX
Alfa Romeo,164
Alfa Romeo,4C
Alfa Romeo,4C Spider
Alfa Romeo,Giulia
Alfa Romeo,Spider
Alfa Romeo,Stelvio
Alfa Romeo,Tonale
Aston Martin,DB11
Aston Martin,DB9
Aston Martin,DB9 GT
Aston Martin,DBS
Aston Martin,DBS Superleggera
Aston Martin,DBX
Aston Martin,Rapide
Aston Martin,Rapide S
Aston Martin,Vanquish
Aston Martin,Vanquish S
Aston Martin,Vantage
Aston Martin,Virage
Audi,100
Audi,80
Audi,90
Audi,A3
Audi,A3 Sportback e-tron
Audi,A4
Audi,A4 (2005.5)
Audi,A4 allroad
Audi,A5
Audi,A5 Sport
Audi,A6
Audi,A6 allroad
Audi,A7
Audi,A8
Audi,Cabriolet
Audi,Q3
Audi,Q4 Sportback e-tron
Audi,Q4 e-tron
Audi,Q5
Audi,Q5 Sportback
Audi,Q7
Audi,Q8
Audi,Quattro
Audi,R8
Audi,RS 3
Audi,RS 4
Audi,RS 5
Audi,RS 6
Audi,RS 7
Audi,RS Q8
Audi,RS e-tron GT
Audi,S3
Audi,S4
Audi,S4 (2005.5)
Audi,S5
Audi,S6
Audi,S7
Audi,S8
Audi,SQ5
Audi,SQ5 Sportback
Audi,SQ7
Audi,SQ8
Audi,TT
Audi,allroad
Audi,e-tron
Audi,e-tron GT
Audi,e-tron S
Audi,e-tron S Sportback
Audi,e-tron Sportback
BMW,1 Series
BMW,2 Series
BMW,3 Series
BMW,4 Series
BMW,5 Series
BMW,6 Series
BMW,7 Series
BMW,8 Series
BMW,Alpina B7
BMW,M
BMW,M2
BMW,M3
BMW,M4
BMW,M5
BMW,M6
BMW,M8
BMW,X1
BMW,X2
BMW,X3
BMW,X3 M
BMW,X4
BMW,X4 M
BMW,X5
BMW,X5 M
BMW,X6
BMW,X6 M
BMW,X7
BMW,Z3
BMW,Z4
BMW,Z4 M
BMW,Z8
BMW,i3
BMW,i4
BMW,i7
BMW,i8
BMW,iX
Bentley,Arnage
Bentley,Azure
Bentley,Azure T
Bentley,Bentayga
Bentley,Brooklands
Bentley,Continental
Bentley,Continental GT
Bentley,Flying Spur
Bentley,Mulsanne
Buick,Cascada
Buick,Century
Buick,Enclave
Buick,Encore
Buick,Encore GX
Buick,Envision
Buick,LaCrosse
Buick,LeSabre
Buick,Lucerne
Buick,Park Avenue
Buick,Rainier
Buick,Regal
Buick,Regal Sportback
Buick,Regal TourX
Buick,Rendezvous
Buick,Riviera
Buick,Roadmaster
Buick,Skylark
Buick,Terraza
Buick,Verano
Cadillac,ATS
Cadillac,ATS-V
Cadillac,Allante
Cadillac,Brougham
Cadillac,CT4
Cadillac,CT5
Cadillac,CT6
Cadillac,CT6-V
Cadillac,CTS
Cadillac,CTS-V
Cadillac,Catera
Cadillac,DTS
Cadillac,DeVille
Cadillac,ELR
Cadillac,Eldorado
Cadillac,Escalade
Cadillac,Escalade ESV
Cadillac,Escalade EXT
Cadillac,Fleetwood
Cadillac,LYRIQ
Cadillac,SRX
Cadillac,STS
Cadillac,Seville
Cadillac,Sixty Special
Cadillac,XLR
Cadillac,XT4
Cadillac,XT5
Cadillac,XT6
Cadillac,XTS
Chevrolet,1500 Extended Cab
Chevrolet,1500 Regular Cab
Chevrolet,2500 Crew Cab
Chevrolet,2500 Extended Cab
Chevrolet,2500 HD Extended Cab
Chevrolet,2500 HD Regular Cab
Chevrolet,2500 Regular Cab
Chevrolet,3500 Crew Cab
Chevrolet,3500 Extended Cab
Chevrolet,3500 HD Extended Cab
Chevrolet,3500 HD Regular Cab
Chevrolet,3500 Regular Cab
Chevrolet,APV Cargo
Chevrolet,Astro Cargo
Chevrolet,Astro Passenger
Chevrolet,Avalanche
Chevrolet,Avalanche 1500
Chevrolet,Avalanche 2500
Chevrolet,Aveo
Chevrolet,Beretta
Chevrolet,Blazer
Chevrolet,Blazer EV
Chevrolet,Bolt EUV
Chevrolet,Bolt EV
Chevrolet,Camaro
Chevrolet,Caprice
Chevrolet,Caprice Classic
Chevrolet,Captiva Sport
Chevrolet,Cavalier
Chevrolet,City Express
Chevrolet,Classic
Chevrolet,Cobalt
Chevrolet,Colorado Crew Cab
Chevrolet,Colorado Extended Cab
Chevrolet,Colorado Regular Cab
Chevrolet,Corsica
Chevrolet,Corvette
Chevrolet,Cruze
Chevrolet,Cruze Limited
Chevrolet,Equinox
Chevrolet,Equinox EV
Chevrolet,Express 1500 Cargo
Chevrolet,Express 1500 Passenger
Chevrolet,Express 2500 Cargo
Chevrolet,Express 2500 Passenger
Chevrolet,Express 3500 Cargo
Chevrolet,Express 3500 Passenger
Chevrolet,G-Series 1500
Chevrolet,G-Series 2500
Chevrolet,G-Series 3500
Chevrolet,G-Series G10
Chevrolet,G-Series G20
Chevrolet,G-Series G30
Chevrolet,HHR
Chevrolet,Impala
Chevrolet,Impala Limited
Chevrolet,Lumina
Chevrolet,Lumina APV
Chevrolet,Lumina Cargo
Chevrolet,Lumina Passenger
Chevrolet,Malibu
Chevrolet,Malibu (Classic)
Chevrolet,Malibu Limited
Chevrolet,Metro
Chevrolet,Monte Carlo
Chevrolet,Prizm
Chevrolet,S10 Blazer
Chevrolet,S10 Crew Cab
Chevrolet,S10 Extended Cab
Chevrolet,S10 Regular Cab
Chevrolet,SS
Chevrolet,SSR
Chevrolet,Silverado (Classic) 1500 Crew Cab
Chevrolet,Silverado (Classic) 1500 Extended Cab
Chevrolet,Silverado (Classic) 1500 HD Crew Cab
Chevrolet,Silverado (Classic) 1500 Regular Cab
Chevrolet,Silverado (Classic) 2500 HD Crew Cab
Chevrolet,Silverado (Classic) 2500 HD Extended Cab
Chevrolet,Silverado (Classic) 2500 HD Regular Cab
Chevrolet,Silverado (Classic) 3500 Crew Cab
Chevrolet,Silverado (Classic) 3500 Extended Cab
Chevrolet,Silverado (Classic) 3500 Regular Cab
Chevrolet,Silverado 1500 Crew Cab
Chevrolet,Silverado 1500 Double Cab
Chevrolet,Silverado 1500 Extended Cab
Chevrolet,Silverado 1500 HD Crew Cab
Chevrolet,Silverado 1500 LD Double Cab
Chevrolet,Silverado 1500 Limited Crew Cab
Chevrolet,Silverado 1500 Limited Double Cab
Chevrolet,Silverado 1500 Limited Regular Cab
Chevrolet,Silverado 1500 Regular Cab
Chevrolet,Silverado 2500 Crew Cab
Chevrolet,Silverado 250

Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino



I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


$ head all_cars_unique.csv\
Acura,CL
Acura,ILX
Acura,Integra
Acura,Legend
Acura,MDX
Acura,MDX Sport Hybrid
Acura,NSX
Acura,RDX
Acura,RL
Acura,RLX
$ wc -l all_cars_unique.csv
1415 all_cars_unique.csv
$ grep -i v60 all_cars_unique.csv
Genesis,GV60
Volvo,V60
$

Essentially, I want my input field to suggest autofill options with data 
from this file/list. The user types "v60" and a REST point will offer:


[
 {"model":"GV60", "manufacturer":"Genesis"},
 {"model":"V60", "manufacturer":"Volvo"}
]

i.e. a JSON response that I can use to generate the autofill with 
JavaScript. My Back-End is Python (Flask).


How can I implement this? A library called Whoosh seems very promising 
(albeit it's so feature-rich that it's almost like shooting a fly with a 
bazooka in my case), but I see two problems:


 1) Whoosh is either abandoned or the project is a mess in terms of 
community and support (https://groups.google.com/g/whoosh/c/QM_P8cGi4v4 
) and


 2) Whoosh seems to be a Python only thing, which is great for now, but 
I wouldn't want this to become an obstacle should I need port it to a 
different language at some point.


are there other options that are fast out there? Can I "grep" through a 
data structure in python... but faster?


Thanks

Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast full-text searching in Python (job for Whoosh?)

2023-03-05 Thread Dino

On 3/4/2023 10:43 PM, Dino wrote:


I need fast text-search on a large (not huge, let's say 30k records 
totally) list of items. Here's a sample of my raw data (a list of US 
cars: model and make)


I suspect I am really close to answering my own question...

>>> import time
>>> lis = [str(a**2+a*3+a) for a in range(0,3)]
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s);

753800
>>> s = time.process_time_ns(); res = [el for el in lis if "52356" in 
el]; print(time.process_time_ns() -s);

1068300
>>> s = time.process_time_ns(); res = [el for el in lis if "5256" in 
el]; print(time.process_time_ns() -s);

862000
>>> s = time.process_time_ns(); res = [el for el in lis if "6" in el]; 
print(time.process_time_ns() -s);

1447300
>>> s = time.process_time_ns(); res = [el for el in lis if "1" in el]; 
print(time.process_time_ns() -s);

1511100
>>> s = time.process_time_ns(); res = [el for el in lis if "13467" in 
el]; print(time.process_time_ns() -s); print(len(res), res[:10])

926900
2 ['134676021', '313467021']
>>>

I can do a substring search in a list of 30k elements in less than 2ms 
with Python. Is my reasoning sound?


Dino


--
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-17 Thread Dino



Thank you, Gerard. I really appreciate your help

Dino

On 2/16/2023 9:40 PM, Weatherby,Gerard wrote:

I think this does the trick:

https://gist.github.com/Gerardwx/c60d200b4db8e7864cb3342dd19d41c9


#!/usr/bin/env python3
import collections
import random
from typing import Hashable, Any, Optional, Dict, Tuple


class LruCache:
 """Dictionary like storage of most recently inserted values"""

 def __init__(self, size: int = 1000):
 """:param size number of cached entries"""
 assert isinstance(size, int)
 self.size = size
 self.insert_counter = 0
self.oldest = 0
 self._data : Dict[Hashable,Tuple[Any,int]]= {} # store values and age 
index
 self._lru: Dict[int, Hashable] = {} # age counter dictionary

 def insert(self, key: Hashable, value: Any) -> None:
 """Insert into dictionary"""
 existing = self._data.get(key, None)
 self._data[key] = (value, self.insert_counter)
 self._lru[self.insert_counter] = key
 if existing is not None:
 self._lru.pop(existing[1], None)  # remove old counter value, if 
it exists
 self.insert_counter += 1
 if (sz := len(self._data)) > self.size:  # is cache full?
 assert sz == self.size + 1
 while (
 key := self._lru.get(self.oldest, None)) is None:  # index may not 
be present, if value was reinserted
 self.oldest += 1
 del self._data[key]  # remove oldest key / value from dictionary
 del self._lru[self.oldest]
 self.oldest += 1  # next oldest index
 assert len(self._lru) == len(self._data)

 def get(self, key: Hashable) -> Optional[Any]:
 """Get value or return None if not in cache"""
 if (tpl := self._data.get(key, None)) is not None:
 return tpl[0]
 return None


if __name__ == "__main__":
 CACHE_SIZE = 1000
 TEST_SIZE = 1_000_000
 cache = LruCache(size=CACHE_SIZE)

 all = []
 for i in range(TEST_SIZE):
 all.append(random.randint(-5000, 5000))

 summary = collections.defaultdict(int)
 for value in all:
 cache.insert(value, value * value)
 summary[value] += 1
 smallest = TEST_SIZE
 largest = -TEST_SIZE
 total = 0
 for value, count in summary.items():
 smallest = min(smallest, count)
 largest = max(largest, count)
 total += count
 avg = total / len(summary)
 print(f"{len(summary)} values occurrences range from {smallest} to {largest}, 
average {avg:.1f}")

 recent = set()  # recent most recent entries
 for i in range(len(all) - 1, -1, -1):  # loop backwards to get the most 
recent entries
 value = all[i]
 if len(recent) < CACHE_SIZE:
 recent.add(value)
 if value in recent:
 if (r := cache.get(value)) != value * value:
 raise ValueError(f"Cache missing recent {value} {r}")
 else:
 if cache.get(value) != None:
 raise ValueError(f"Cache includes old {value}")

From: Python-list  on behalf of 
Dino 
Date: Wednesday, February 15, 2023 at 3:07 PM
To: python-list@python.org 
Subject: Re: LRU cache
*** Attention: This is an external email. Use caution responding, opening 
attachments or clicking on links. ***

Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but
maybe I am missing something. I'll look closer.

On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:




--
https://urldefense.com/v3/__https://mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$<https://urldefense.com/v3/__https:/mail.python.org/mailman/listinfo/python-list__;!!Cn_UX_p3!jb3Gr2BFAPLJ2YuI5rFdJUtalqWcijhxHAfdmCI3afnLFDdcekALxDYAQwpE1L_JlJBBJ-BB3BuLdoSE$>


--
https://mail.python.org/mailman/listinfo/python-list


Re: LRU cache

2023-02-15 Thread Dino



Thank you Mats, Avi and Chris

btw, functools.lru_cache seems rather different from what I need, but 
maybe I am missing something. I'll look closer.


On 2/14/2023 7:36 PM, Mats Wichmann wrote:

On 2/14/23 15:07, Dino wrote:




--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing caching strategies

2023-02-14 Thread Dino

On 2/10/2023 7:39 PM, Dino wrote:


- How would you structure the caching so that different caching 
strategies are "pluggable"? change one line of code (or even a config 
file) and a different caching strategy is used in the next run. Is this 
the job for a design pattern such as factory or facade?



turns out that the strategy pattern was the right one for me.



--
https://mail.python.org/mailman/listinfo/python-list


LRU cache

2023-02-14 Thread Dino



Here's my problem today. I am using a dict() to implement a quick and 
dirty in-memory cache.


I am stopping adding elements when I am reaching 1000 elements (totally 
arbitrary number), but I would like to have something slightly more 
sophisticated to free up space for newer and potentially more relevant 
entries.


I am thinking of the Least Recently Used principle, but how to implement 
that is not immediate. Before I embark on reinventing the wheel, is 
there a tool, library or smart trick that will allow me to remove 
elements with LRU logic?


thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Comparing caching strategies

2023-02-13 Thread Dino
First off, a big shout out to Peter J. Holzer, who mentioned roaring 
bitmaps a few days ago and led me to quite a discovery.


Now I am stuck with an internal dispute with another software architect 
(well, with a software architect, I should say, as I probably shouldn't 
define myself a software architect when confronted with people with more 
experience than me in building more complex systems).
Anyway, now that I know what roaring bitmaps are (and what they can 
do!), my point is that we should abandon other attempts to build a 
caching layer for our project and just veer decidedly towards relying on 
those magic bitmaps and screw anything else. Sure, there is some 
overhead  marshaling our entries into integers and back, but the sheer 
speed and compactness of RBMs trump any other consideration (according 
to me, not according to the other guy, obviously).


Long story short: I want to prototype a couple of caching strategies in 
Python using bitmaps, and measure both performance and speed.


So, here are a few questions from an inexperienced programmer for you, 
friends. Apologies if they are a bit "open ended".


- How would you structure the caching so that different caching 
strategies are "pluggable"? change one line of code (or even a config 
file) and a different caching strategy is used in the next run. Is this 
the job for a design pattern such as factory or facade?


- what tool should I use to measure/log performance and memory 
occupation of my script? Google is coming up with quite a few options, 
but I value the opinion of people here a lot.


Thank you for any feedback you may be able to provide.

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: RE: bool and int

2023-01-28 Thread Dino



you have your reasons, and I was tempted to stop there, but... I have to 
pick this...


On 1/26/2023 10:09 PM, avi.e.gr...@gmail.com wrote:

  You can often borrow
ideas and code from an online search and hopefully cobble "a" solution
together that works well enough. Of course it may suddenly fall apart.


also carefully designed systems that are the work of experts may 
suddenly fall apart.


Thank you for all the time you have used to address the points I raised. 
It was interesting reading.


Dino

--
https://mail.python.org/mailman/listinfo/python-list


Re: bool and int

2023-01-26 Thread Dino

On 1/25/2023 5:42 PM, Chris Angelico wrote:



Try this (or its equivalent) in as many languages as possible:

x = (1 > 2)
x == 0

You'll find that x (which has effectively been set to False, or its
equivalent in any language) will be equal to zero in a very large
number of languages. Thus, to an experienced programmer, it would
actually be quite the opposite: having it NOT be a number would be the
surprising thing!


I thought I had already responded to this, but I can't see it. Weird.

Anyway, straight out of the Chrome DevTools console:

x = (1>2)
false

x == 0
true

typeof(x)
'boolean'

typeof(0)
'number'

typeof(x) == 'number'
false

So, you are technically correct, but you can see that JavaScript - which 
comes with many gotchas - does not offer this particular one.



--
https://mail.python.org/mailman/listinfo/python-list


Re: RE: bool and int

2023-01-26 Thread Dino



Wow. That was quite a message and an interesting read. Tempted to go 
deep and say what I agree and what I disagree with, but there are two 
issues: 1) time 2) I will soon be at a disadvantage discussing with 
people (you or others) who know more than me (which doesn't make them 
right necessarily, but certainly they'll have the upper-hand in a 
discussion).


Personally, in the first part of my career I got into the habit of 
learning things fast, sometimes superficially I confess, and then get 
stuff done hopefully within time and budget. Not the recommended 
approach if you need to build software for a nuclear plant. An OK 
approach (within reason) if you build websites or custom solutions for 
this or that organization and the budget is what it is. After all, 
technology moves sooo fast, and what we learn in detail today is bound 
to be old and possibly useless 5 years down the road.


Also, I argue that there is value in having familiarity with lots of 
different technologies (front-end and back-end) and knowing (or at 
lease, having a sense) of how they can all be made play together with an 
appreciation of the different challenges and benefits that each domain 
offers.


Anyway, everything is equivalent to a Turing machine and IA will screw 
everyone, including programmers, eventually.


Thanks again and have a great day

Dino

On 1/25/2023 9:14 PM, avi.e.gr...@gmail.com wrote:

Dino,

There is no such things as a "principle of least surprise" or if you insist
there is, I can nominate many more such "rules" such as "the principle of
get out of my way and let me do what I want!"

Computer languages with too many rules are sometimes next to unusable in
practical situations.

I am neither defending or attacking choices Python or other languages have
made. I merely observe and agree to use languages carefully and as
documented.


--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-26 Thread Dino

On 1/25/2023 4:30 PM, Thomas Passin wrote:

On 1/25/2023 3:29 PM, Dino wrote:
Great!  Don't forget what I said about potential overheating if you hit 
the server with as many requests as it can handle.


Noted. Thank you.




--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 3:27 PM, Dino wrote:

On 1/25/2023 1:33 PM, orzodk wrote:


I have used locust with success in the past.

https://locust.io


First impression, exactly what I need. Thank you Orzo!


the more I learn about Locust and I tinker with it, the more I love it. 
Thanks again.

--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 1:21 PM, Thomas Passin wrote:



I actually have a Python program that does exactly this.  


Thank you, Thomas. I'll check out Locust, mentioned by Orzodk, as it 
looks like a mature library that appears to do exactly what I was hoping.




--
https://mail.python.org/mailman/listinfo/python-list


Re: HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino

On 1/25/2023 1:33 PM, orzodk wrote:



I have used locust with success in the past.

https://locust.io


First impression, exactly what I need. Thank you Orzo!
--
https://mail.python.org/mailman/listinfo/python-list


Re: bool and int

2023-01-25 Thread Dino

On 1/23/2023 11:22 PM, Dino wrote:

 >>> b = True
 >>> isinstance(b,bool)
True
 >>> isinstance(b,int)
True
 >>>


ok, I read everything you guys wrote. Everyone's got their reasons 
obviously, but allow me to observe that there's also something called 
"principle of least surprise".


In my case, it took me some time to figure out where a nasty bug was 
hidden. Letting a bool be a int is quite a gotcha, no matter how hard 
the benevolent dictator tries to convince me otherwise!




--
https://mail.python.org/mailman/listinfo/python-list


HTTP server benchmarking/load testing in Python

2023-01-25 Thread Dino



Hello, I could use something like Apache ab in Python ( 
https://httpd.apache.org/docs/2.4/programs/ab.html ).


The reason why ab doesn't quite cut it for me is that I need to define a 
pool of HTTP requests and I want the tool to run those (as opposed to 
running the same request over and over again)


Does such a marvel exist?

Thinking about it, it doesn't necessarily need to be Python, but I guess 
I would have a chance to tweak things if it was.


Thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


bool and int

2023-01-24 Thread Dino



$ python
Python 3.8.10 (default, Mar 15 2022, 12:22:08)
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> b = True
>>> isinstance(b,bool)
True
>>> isinstance(b,int)
True
>>>

WTF!

--
https://mail.python.org/mailman/listinfo/python-list


Re: tree representation of Python data

2023-01-21 Thread Dino


you rock. Thank you, Stefan.

Dino

On 1/21/2023 2:41 PM, Stefan Ram wrote:

r...@zedat.fu-berlin.de (Stefan Ram) writes:

def display_( object, last ):
directory = object; result = ''; count = len( directory )
for entry in directory:
count -= 1; name = entry; indent = ''
for c in last[ 1: ]: indent += '│   ' if c else ''
indent += '├──' if count else '└──' if last else ''
result += '\n' + indent +( ' ' if indent else '' )+ name
if directory[ entry ]:
result += display_( directory[ entry ], last +[ count ])
return result


   This ultimate version has some variable names made more speaking:

def display_( directory, container_counts ):
 result = ''; count = len( directory )
 for name in directory:
 count -= 1; indent = ''
 for container_count in container_counts[ 1: ]:
 indent += '│   ' if container_count else ''
 indent += '├──' if count else '└──' if container_counts else ''
 result += '\n' + indent +( ' ' if indent else '' )+ name
 if directory[ name ]:
 result += display_\
 ( directory[ name ], container_counts +[ count ])
 return result




--
https://mail.python.org/mailman/listinfo/python-list


Re: ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-21 Thread Dino



I learned new things today and I thank you all for your responses.

Please consider yourself thanked individually.

Dino

On 1/20/2023 10:29 AM, Dino wrote:


let's say I have this list of nested dicts:


--
https://mail.python.org/mailman/listinfo/python-list


tree representation of Python data

2023-01-21 Thread Dino


I have a question that is a bit of a shot in the dark. I have this nice 
bash utility installed:


$ tree -d unit/
unit/
├── mocks
├── plugins
│   ├── ast
│   ├── editor
│   ├── editor-autosuggest
│   ├── editor-metadata
│   ├── json-schema-validator
│   │   └── test-documents
│   └── validate-semantic
│   ├── 2and3
│   ├── bugs
│   └── oas3
└── standalone
└── topbar-insert

I just thought that it would be great if there was a Python utility that 
visualized a similar graph for nested data structures.
Of course I am aware of indent (json.dumps()) and pprint, and they are 
OK options for my need. It's just that the compact, improved 
visualization would be nice to have. Not so nice that I would go out of 
my way to build, but nice enough to use an exising package.


Thanks

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-20 Thread Dino

On 1/20/2023 11:06 AM, Tobiah wrote:

On 1/20/23 07:29, Dino wrote:




This doesn't look like the program output you're getting.


you are right that I tweaked the name of fields and variables manually 
(forgot a couple of places, my bad) to illustrate the problem more 
generally, but hopefully you get the spirit.


"value": cn,
"a": cd[cn]["a"],
"b": cd[cn]["b"]

Anyway, the key point (ooops, a pun) is if there's a more elegant way to 
do this (i.e. get a reference to the unique key in a dict() when the key 
is unknown):


cn = list(cd.keys())[0] # There must be a better way than this!

Thanks

--
https://mail.python.org/mailman/listinfo/python-list


ok, I feel stupid, but there must be a better way than this! (finding name of unique key in dict)

2023-01-20 Thread Dino



let's say I have this list of nested dicts:

[
  { "some_key": {'a':1, 'b':2}},
  { "some_other_key": {'a':3, 'b':4}}
]

I need to turn this into:

[
  { "value": "some_key", 'a':1, 'b':2},
  { "value": "some_other_key", 'a':3, 'b':4}
]

I actually did it with:

listOfDescriptors = list()
for cd in origListOfDescriptors:
cn = list(cd.keys())[0] # There must be a better way than this!
listOfDescriptors.append({
"value": cn,
"type": cd[cn]["a"],
    "description": cd[cn]["b"]
})

and it works, but I look at this and think that there must be a better 
way. Am I missing something obvious?


PS: Screw OpenAPI!

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-17 Thread Dino


Thanks a lot, Edmondo. Or better... Grazie mille.

On 1/17/2023 5:42 AM, Edmondo Giovannozzi wrote:


Sorry,
I was just creating an array of 400x10 elements that I fill with random 
numbers:

   a = np.random.randn(400,100_000)

Then I pick one element randomly, it is just a stupid sort on a row and then I 
take an element in another row, but it doesn't matter, I'm just taking a random 
element. I may have used other ways to get that but was the first that came to 
my mind.

  ia = np.argsort(a[0,:])
  a_elem = a[56, ia[0]]

The I'm finding that element in the all the matrix a (of course I know where it 
is, but I want to test the speed of a linear search done on the C level):

%timeit isel = a == a_elem

Actually isel is a logic array that is True where a[i,j] == a_elem and False 
where a[i,j] != a_elem. It may find more then one element but, of course, in 
our case it will find only the element that we have selected at the beginning. 
So it will give the speed of a linear search plus the time needed to allocate 
the logic array. The search is on the all matrix of 40 million of elements not 
just on one of its row of 100k element.

On the single row (that I should say I have chosen to be contiguous) is much 
faster.

%timeit isel = a[56,:] == a_elem
26 µs ± 588 ns per loop (mean ± std. dev. of 7 runs, 1 loops each)

the matrix is a double precision numbers that is 8 byte, I haven't tested it on 
string of characters.

This wanted to be an estimate of the speed that one can get going to the C 
level.
You loose of course the possibility to have a relational database, you need to 
have everything in memory, etc...

A package that implements tables based on numpy is pandas: 
https://pandas.pydata.org/

I hope that it can be useful.




--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino

On 1/16/2023 1:18 PM, Edmondo Giovannozzi wrote:


As a comparison with numpy. Given the following lines:

import numpy as np
a = np.random.randn(400,100_000)
ia = np.argsort(a[0,:])
a_elem = a[56, ia[0]]

I have just taken an element randomly in a numeric table of 400x10 elements
To find it with numpy:

%timeit isel = a == a_elem
35.5 ms ± 2.79 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

And
%timeit a[isel]
9.18 ms ± 371 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As data are not ordered it is searching it one by one but at C level.
Of course it depends on a lot of thing...


thank you for this. It's probably my lack of experience with Numpy, 
but... can you explain what is going on here in more detail?


Thank you

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino

On 1/16/2023 2:53 AM, David wrote:

See here:
   https://docs.python.org/3/reference/expressions.html#assignment-expressions
   https://realpython.com/python-walrus-operator/


Thank you, brother.



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-16 Thread Dino



Just wanted to take a moment to express my gratitude to everyone who 
responded here. You have all been so incredibly helpful. Thank you


Dino

On 1/14/2023 11:26 PM, Dino wrote:


Hello, I have built a PoC service in Python Flask for my work, and - now 
that the point is made - I need to make it a little more performant (to 
be honest, chances are that someone else will pick up from where I left 
off, and implement the same service from scratch in a different language 
(GoLang? .Net? Java?) but I am digressing).

--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino

On 1/15/2023 2:23 PM, Weatherby,Gerard wrote:

That’s about what I got using a Python dictionary on random data on a high 
memory machine.

https://github.com/Gerardwx/database_testing.git

It’s not obvious to me how to get it much faster than that.


Gerard, you are a rockstar. This is going to be really useful if I do 
decide to adopt sqlite3 for my PoC, as I understand what's going on 
conceptually, but never really used sqlite (nor SQL in a long long 
time), so this may save me a bunch of time.


I created a 300 Mb DB using your script. Then:

$ ./readone.py
testing 2654792 of 4655974
Found somedata0002654713 for 1ed9f9cd-0a9e-47e3-b0a7-3e1fcdabe166 in 
0.23933520219 seconds


$ ./prefetch.py
Index build 4.42093784897 seconds
testing 3058568 of 4655974
Found somedata202200 for 5dca1455-9cd6-4e4d-8e5a-7e6400de7ca7 in 
4.443999403715e-06 seconds


So, if I understand right:

1) once I built a dict out of the DB (in about 4 seconds), I was able to 
lookup an entry/record in 4 microseconds(!)


2) looking up a record/entry using a Sqlite query took 0.2 seconds (i.e. 
500x slower)


Interesting. Thank you for this. Very informative. I really appreciate 
that you took the time to write this.


The conclusion seems to me that I probably don't want to go the Sqlite 
route, as I would be placing my data into a database just to extract it 
back into a dict when I need it if I want it fast.


Ps: a few minor fixes to the README as this may be helpful to others.

./venv/... => ./env/..

i.e.
 ./env/bin/pip install -U pip
 ./env/bin/pip install -e .

Also add part in []

Run create.py [size of DB in bytes] prior to running readone.py and/or 
prefetch.py


BTW, can you tell me what is going on here? what's := ?

   while (increase := add_some(conn,adding)) == 0:

https://github.com/Gerardwx/database_testing/blob/main/src/database_testing/create.py#L40

Dino
--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino



Thank you, Peter. Yes, setting up my own indexes is more or less the 
idea of the modular cache that I was considering. Seeing others think in 
the same direction makes it look more viable.


About Scalene, thank you for the pointer. I'll do some research.

Do you have any idea about the speed of a SELECT query against a 100k 
rows / 300 Mb Sqlite db?


Dino

On 1/15/2023 6:14 AM, Peter J. Holzer wrote:

On 2023-01-14 23:26:27 -0500, Dino wrote:

Hello, I have built a PoC service in Python Flask for my work, and - now
that the point is made - I need to make it a little more performant (to be
honest, chances are that someone else will pick up from where I left off,
and implement the same service from scratch in a different language (GoLang?
.Net? Java?) but I am digressing).

Anyway, my Flask service initializes by loading a big "table" of 100k rows
and 40 columns or so (memory footprint: order of 300 Mb)


300 MB is large enough that you should at least consider putting that
into a database (Sqlite is probably simplest. Personally I would go with
PostgreSQL because I'm most familiar with it and Sqlite is a bit of an
outlier).

The main reason for putting it into a database is the ability to use
indexes, so you don't have to scan all 100 k rows for each query.

You may be able to do that for your Python data structures, too: Can you
set up dicts which map to subsets you need often?

There are some specialized in-memory bitmap implementations which can be
used for filtering. I've used
[Judy bitmaps](https://judy.sourceforge.net/doc/Judy1_3x.htm) in the
past (mostly in Perl).
These days [Roaring Bitmaps](https://www.roaringbitmap.org/) is probably
the most popular. I see several packages on PyPI - but I haven't used
any of them yet, so no recommendation from me.

Numpy might also help. You will still have linear scans, but it is more
compact and many of the searches can probably be done in C and not in
Python.


As you can imagine, this is not very performant in its current form, but
performance was not the point of the PoC - at least initially.


For performanc optimization it is very important to actually measure
performance, and a good profiler helps very much in identifying hot
spots. Unfortunately until recently Python was a bit deficient in this
area, but [Scalene](https://pypi.org/project/scalene/) looks promising.

 hp



--
https://mail.python.org/mailman/listinfo/python-list


Re: Fast lookup of bulky "table"

2023-01-15 Thread Dino


Thank you for your answer, Lars. Just a clarification: I am already 
doing a rough measuring of my queries.


A fresh query without any caching: < 4s.

Cached full query: < 5 micro-s (i.e. 6 orders of magnitude faster)

Desired speed for my POC: 10 Also, I didn't want to ask a question with way too many "moving parts", 
but when I talked about the "table", it's actually a 100k long list of 
IDs. I can then use each ID to invoke an API that will return those 40 
attributes. The API is fast, but still, I am bound to loop through the 
whole thing to respond to the query, that's unless I pre-load the data 
into something that allows faster access.


Also, as you correctly observed, "looking good with my colleagues" is a 
nice-to-have feature at this point, not really an absolute requirement :)


Dino

On 1/15/2023 3:17 AM, Lars Liedtke wrote:

Hey,

before you start optimizing. I would suggest, that you measure response 
times and query times, data search times and so on. In order to save 
time, you have to know where you "loose" time.


Does your service really have to load the whole table at once? Yes that 
might lead to quicker response times on requests, but databases are 
often very good with caching themselves, so that the first request might 
be slower than following requests, with similar parameters. Do you use a 
database, or are you reading from a file? Are you maybe looping through 
your whole dataset on every request? Instead of asking for the specific 
data?


Before you start introducing a cache and its added complexity, do you 
really need that cache?


You are talking about saving microseconds, that sounds a bit as if you 
might be “overdoing” it. How many requests will you have in the future? 
At least in which magnitude and how quick do they have to be? You write 
about 1-4 seconds on your laptop. But that does not really tell you that 
much, because most probably the service will run on a server. I am not 
saying that you should get a server or a cloud-instance to test against, 
but to talk with your architect about that.


I totally understand your impulse to appear as good as can be, but you 
have to know where you really need to debug and optimize. It will not be 
advantageous for you, if you start to optimize for optimizing's sake. 
Additionally if you service is a PoC, optimizing now might be not the 
first thing you have to worry about, but about that you made everything 
as simple and readable as possible and that you do not spend too much 
time for just showing how it could work.


But of course, I do not know the tasks given to you and the expectations 
you have to fulfil. All I am trying to say is to reconsider where you 
really could improve and how far you have to improve.




--
https://mail.python.org/mailman/listinfo/python-list


Fast lookup of bulky "table"

2023-01-14 Thread Dino



Hello, I have built a PoC service in Python Flask for my work, and - now 
that the point is made - I need to make it a little more performant (to 
be honest, chances are that someone else will pick up from where I left 
off, and implement the same service from scratch in a different language 
(GoLang? .Net? Java?) but I am digressing).


Anyway, my Flask service initializes by loading a big "table" of 100k 
rows and 40 columns or so (memory footprint: order of 300 Mb) and then 
accepts queries through a REST endpoint. Columns are strings, enums, and 
numbers. Once initialized, the table is read only. The endpoint will 
parse the query and match it against column values (equality, 
inequality, greater than, etc.) Finally, it will return a (JSON) list of 
all rows that satisfy all conditions in the query.


As you can imagine, this is not very performant in its current form, but 
performance was not the point of the PoC - at least initially.


Before I deliver the PoC to a more experienced software architect who 
will look at my code, though, I wouldn't mind to look a bit less lame 
and do something about performance in my own code first, possibly by 
bringing the average time for queries down from where it is now (order 
of 1 to 4 seconds per query on my laptop) to 1 or 2 milliseconds on 
average).


To be honest, I was already able to bring the time down to a handful of 
microseconds thanks to a rudimentary cache that will associate the 
"signature" of a query to its result, and serve it the next time the 
same query is received, but this may not be good enough: 1) queries 
might be many and very different from one another each time, AND 2) I am 
not sure the server will have a ton of RAM if/when this thing - or 
whatever is derived from it - is placed into production.


How can I make my queries generally more performant, ideally also in 
case of a new query?


Here's what I have been considering:

1. making my cache more "modular", i.e. cache the result of certain 
(wide) queries. When a complex query comes in, I may be able to restrict 
my search to a subset of the rows (as determined by a previously cached 
partial query). This should keep the memory footprint under control.


2. Load my data into a numpy.array and use numpy.array operations to 
slice and dice my data.


3. load my data into sqlite3 and use SELECT statement to query my table. 
I have never used sqllite, plus there's some extra complexity as 
comparing certain colum requires custom logic, but I wonder if this 
architecture would work well also when dealing with a 300Mb database.


4. Other ideas?

Hopefully I made sense. Thank you for your attention

Dino
--
https://mail.python.org/mailman/listinfo/python-list


RE: [IronPython] IronPython 2.7 Now Available

2011-03-13 Thread Dino Viehland
The PTVS release is really an extended version of the tools in IronPython 2.7.  
It adds support for CPython including debugging, profiling, etc...  while still 
supporting IronPython as well.  We'll likely either replace the tools 
distributed w/ IronPython with this version (maybe minus things like HPC 
support) or we'll pull the IpyTools out of the distribution and encourage 
people to go for the separate download.  No changes will likely happen until 
IronPython 3.x though as 2.7 is now out the door and it'd be a pretty 
significant change.

For the time being you'll need to choose one or the other - you can always 
choose to not by either not installing the IpyTools w/ the IronPython install 
and install the PTVS or you can just stick w/ the existing IronPython tools.

> -Original Message-
> From: users-boun...@lists.ironpython.com [mailto:users-
> boun...@lists.ironpython.com] On Behalf Of Medcoff, Charles
> Sent: Sunday, March 13, 2011 2:15 PM
> To: Discussion of IronPython; python-list
> Subject: Re: [IronPython] IronPython 2.7 Now Available
> 
> Can someone on the list clarify differences or overlap between the tools
> included in this release, and the PTVS release?
> ___
> Users mailing list
> us...@lists.ironpython.com
> http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Python Tools for Visual Studio from Microsoft - Free & Open Source

2011-03-10 Thread Dino Viehland


Patty wrote:
> Thanks so much for this reference - and the detailed further explanation!  I
> have a Windows 7 system and recently installed Visual Studio 2010 for the
> SQL Server, Visual C/C++ and Visual Basic.  I would love to have this Python
> tool installed under Visual Studio but a few questions:   1)  I have regular
> Python installed not Cpython or Jpython or any other variant (have both 2.6
> and 3.2 versions) so would that be a problem and it won't install or won't
> work?  2) I saw that this was a beta, would there be an automatic notification
> that there are upgrades (I mean within the software itself) or would it be
> advisable for me to wait until it goes final because I am relatively newer to
> Python and maybe shouldn't be mucking with a beta of
> something   3) there is a message bar at the top right corner of the web
> page that a certain number of people are 'following this project' Is that
> where you would rely on for upgrades notifications or what exactly would
> you be following as far as a 'project' of this type?

CPython is actually regular Python - the C just clarifies that it's the 
implementation
written in C (vs. C#, Java, or Python).  

There won't be any notification of updates via the software it's self but given
that you heard about the 1st release within days of it coming out my guess is
you'll hear about the updates as well.  

I'm not actually certain if following a project on CodePlex will give you e-mail
notifications or not.  I typically subscribe to CodePlex's RSS feed for projects
I'm implemented in - for example this feed 
http://pytools.codeplex.com/project/feeds/rss 
includes all changes to the project.  There's other feeds below the RSS button
which track just new releases or other things.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Packages at Python.org

2010-12-01 Thread Dino Viehland


Kirby wrote:
> ** Unconfirmed rumors about IronPython leave me blog searching this
> afternoon.  Still part of Codeplex?

IronPython is still using CodePlex for bug tracking and posting releases but
active development is now on GitHub w/ a Mercurial mirror.  Jeff's blog has
more info: http://jdhardy.blogspot.com/


-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Why Python3

2010-06-29 Thread Dino Viehland
Terry wrote:
>  > IronPython targets Python 2.6.
> 
> They plan to release a 2.7 version sometime this year after CPython2.7
> is released. They plan to release a 3.2 version early next year, soon
> after CPython. They should be able to do that because they already have
> a 3.1 version mostly done (but will not release it as such) and 3.2 has
> no new syntax, so the 3.1 development version will easily morph into a
> 3.2 release version. I forget just where I read this, but here is a
> public article.
> http://www.itworld.com/development/104506/python-3-and-ironpython
>   Cameron Laird, Python/IronPython developer '''
> As Jimmy Schementi, a Program Manager with Microsoft, e-mailed me last
> week, "IronPython's roadmap over the next year includes compatibility
> with Python 3. Also, we're planning on a release ... before our first
> 3.2-compatible release which will target 2.7 compatibility."

Close but not 100% correct - we do plan to release 2.7 sometime this year
but 3.2 is going to be sometime next year, not early, I would guess EOY.  
I guess Jimmy misspoke a little there but the "2.7 this year 3.2 next year"
plan is what I said during my PyCon State of IronPython talk and it hasn't
changed yet.

Also we have only a few 3.x features implemented (enabled w/ a -X:Python30 
option since 2.6) instead of having a different build for 3.x.  Running 
with that option isn't likely to run any real 3.x code though but it gives
people a chance to test out a few new features.  Of course implementing 2.7
also gets us much closer to 3.x then we are today w/ all its backports so 
we are certainly making progress.

-- 
http://mail.python.org/mailman/listinfo/python-list


ssl, v23 client, v3 server...

2010-03-08 Thread Dino Viehland
In the ssl module docs (and in the tests) it says that if you have a client 
specifying PROTOCOL_SSLv23 (so it'll use v2 or v3) and a server specifying 
PROTOCOL_SSLv3 (so it'll only use v3) that you cannot connect between the two.  
Why doesn't this end up using SSL v3 for the communication?


-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Modifying Class Object

2010-02-10 Thread Dino Viehland
Steve wrote:
> id() simply returns a unique value identifying a particular object. In
> CPython, where objects do not migrate in memory once created, the
> memory
> address of the object is used. In IronPython each object is assigned an
> id when it is created, and that value is stored as an attribute.

Just a point of clarification: In IronPython ids are lazily assigned upon
a call to the id().  They're actually fairly expensive to create because 
the ids need to be maintained by a dictionary which uses weak references.

> >> If you disagree, please write (in any implementation you like: it need
> >> not even be portable, though I can't imagine why ti wouldn't be) a
> >> Python function which takes an id() value as its argument and
> >> returns the value for which the id() value was provided.

Just for fun this works in IronPython 2.6:

>>> import clr
>>> clr.AddReference('Microsoft.Dynamic')
>>> from Microsoft.Scripting.Runtime import IdDispenser
>>> x = object()
>>> id(x)
43
>>> IdDispenser.GetObject(43)

>>> IdDispenser.GetObject(43) is x
True

Please, please, no one ever use this code!

I do generally agree with the sentiment that id is object identity and in
way related to pointers though.

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: myths about python 3

2010-01-28 Thread Dino Viehland
Stefan wrote:
> >From an implementors point of view, it's actually quite the opposite. Most
> syntax features of Python 3 can be easily implemented on top of an existing
> Py2 Implementation (we have most of them in Cython already, and I really
> found them fun to write), and the shifting-around in the standard library
> can hardly be called non-trivial. All the hard work that went into the
> design of CPython 3.x (and into its test suite) now makes it easy to just
> steal from what's there already.
> 
> The amount of work that the Jython project put into catching up from 2.1 to
> 2.5/6 (new style classes! generators!) is really humongous compared to the
> adaptations that an implementation needs to do to support Python 3 code. I
> have great respect for the Jython project for what they achieved in the
> last couple of years. (I also have great respect for the IronPython project
> for fighting the One Microsoft Way into opening up, but that's a different
> kind of business.)
> 
> If there was enough interest from the respective core developers, I
> wouldn't be surprised if we had more than one 'mostly compatible'
> alternative Python 3 implementation in a couple of months. But it's the
> obvious vicious circle business. As long as there aren't enough important
> users of Py3, alternative implementations won't have enough incentives to
> refocus their scarce developer time. Going for 2.6/7 first means that most
> of the Py3 work gets done anyway, so it'll be even easier then. That makes
> 2.6->2.7->3.2/3 the most natural implementation path. (And that, again,
> makes it a *really* good decision that 2.7 will be the last 2.x release line.)

I just want to echo this as I completely agree.  Last time I went through the
list it looked like there were around 10 major new features (some of them even
not so major) that we needed to implement to bring IronPython up to the 3.0
level.  It shouldn't be too time consuming, and it greatly improves our 
compatibility by finally having the same string types, but our users don't 
yet want us to stop supporting 2.x.

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Ironpython experience

2009-12-23 Thread Dino Viehland
Lev wrote:
> I'm an on and off Python developer and use it as one of the tools.
> Never for writing "full-blown" applications, but rather small, "one-of-
> a-kind" utilities. This time I needed some sort of backup and
> reporting utility, which is to be used by the members of our team
> once or twice a day. Execution time is supposed be negligible. The
> project was an ideal candidate to be implemented in Python.  As
> expected the whole script was about 200 lines and was ready in a 2
> hours (the power of Python!).Then I downloaded Ironpython and
> relatively painlessly (except the absence of zlib) converted the
> Python code to Ironpython. Works fine and Ironython really is Python.
> But...
> 
> The CPython 2.6 script runs 0.1 seconds, while Ironpython 2.6 runs
> about 10 seconds. The difference comes from the start-up, when all
> these numerous dlls/assemblies are loaded and JITed.
> 
> Is there any way to speed up the process.

Can you give us more information about the environment you're running
in?  E.g. how did you install IronPython, is this on 32-bit or 64-bit
and are you using ipy.exe or ipy64.exe?

The sweet spot to be in is on a 32-bit machine or a 64-bit machine
and using ipy.exe.  You should also be using ngen'd (pre-compiled)
binaries which the MSI does for you.  Combining 32-bit plus ngen
should greatly reduce startup time and typically on our test machines
it only takes a couple of seconds 
(http://ironpython.codeplex.com/wikipage?title=IP26FinalVsCPy26Perf&referringTitle=IronPython%20Performance).

That's still a lot worse than CPython startup time but it's much
better than 10 seconds.  We also continue to work on startup time -
there's already some big improvements in our Main branch which should
be showing up in 2.6.1.  Matching CPython is still a long ways off
if we ever can do it but do intend to keep on pushing on it.
 
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: [Python-Dev] PEP 384: Defining a Stable ABI

2009-05-17 Thread Dino Viehland
Dirkjan Ochtman wrote:
>
> It would seem to me that optimizations are likely to require data
> structure changes, for exactly the kind of core data structures that
> you're talking about locking down. But that's just a high-level view,
> I might be wrong.
>


In particular I would guess that ref counting is the biggest issue here.
I would think not directly exposing the field and having inc/dec ref
Functions (real methods, not macros) for it would give a lot more
ability to change the API in the future.

It also might make it easier for alternate implementations to support
the same API so some modules could work cross implementation - but I
suspect that's a non-goal of this PEP :).

Other fields directly accessed (via macros or otherwise) might have similar
problems but they don't seem as core as ref counting.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: interpreter vs. compiled

2008-07-30 Thread Dino Viehland
It looks like the pickle differences are due to two issues.  First IronPython 
doesn't have ASCII strings so it serializes strings as Unicode.  Second there 
are dictionary ordering differences.  If you just do:

{ 'a': True, 'b': set( ) }

Cpy prints: {'a': True, 'b': set([])}
Ipy prints: {'b': set([]), 'a': True}

The important thing is that we interop - and indeed you can send either pickle 
string to either implementation and the correct results are deserialized 
(modulo getting Unicode strings).

For your more elaborate example you're right that there could be a problem 
here.  But the DLR actually recognizes this sort of pattern and optimizes for 
it.  All of the additions in your code are what I've been calling serially 
monomorphic call sites.  That is they see the same types for a while, maybe 
even just once as in your example, and then they switch to a new type - never 
to return to the old one.  When IronPython gives the DLR the code for the call 
site the DLR can detect when the code only differs by constants - in this case 
type version checks.  It will then re-write the code turning the changing 
constants into variables.  The next time through when it sees the same code 
again it'll re-use the existing compiled code with the new sets of constants.

That's still slower than we were in 1.x so we'll need to push on this more in 
the future - for example producing a general rule instead of a type-specific 
rule.  But for the time being having the DLR automatically handle this has been 
working good enough for these situations.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 11:40 PM
To: python-list@python.org
Subject: Re: interpreter vs. compiled

I note that IronPython and Python's pickle.dumps do not return the
same value.  Perhaps this relates to the absence of interpreter loop.

>>> p.dumps( { 'a': True, 'b': set( ) } )
IPy: '(dp0\nVb\np1\nc__builtin__\nset\np3\n((lp4\ntp5\nRp2\nsVa
\np6\nI01\ns.'
CPy: "(dp0\nS'a'\np1\nI01\nsS'b'\np2\nc__builtin__\nset
\np3\n((lp4\ntp5\nRp6\ns."

You make me think of a more elaborate example.

for k in range( 100 ):
  i= j()
  g= h+ i
  e= f+ g
  c= d+ e
  a= b+ c

Here, j creates a new class dynamically, and returns an instance of
it.  Addition is defined on it but the return type from it varies.

If I read you correctly, IPy can leave hundreds of different addition
stubs laying around at the end of the for-loop, each of which only
gets executed once or twice, each of which was compiled for the exact
combination of types it was called for.

I might construe this to be a degenerate case, and the majority of
times, you'll reexecute stubs enough to outweigh the length of time
the compilation step takes.  If you still do the bounds checking, it
takes extra instructions (C doesn't), but operation switch-case
BINARY_ADD, (PyInt_CheckExact(v) && PyInt_CheckExact(w)), and POP and
TOP, are all handled by the selection of stubs from $addSite.

I'm read from last April:
>>> The most interesting cases to me are the 5 tests where CPython is more than 
>>> 3x faster than IronPython and the other 5 tests where IronPython is more 
>>> than 3x faster than CPython.  CPython's strongest performance is in 
>>> dictionaries with integer and string keys, list slicing, small tuples and 
>>> code that actually throws and catches exceptions.  IronPython's strongest 
>>> performance is in calling builtin functions, if/then/else blocks, calling 
>>> python functions, deep recursion, and try/except blocks that don't actually 
>>> catch an exception.
<<< 
http://lists.ironpython.com/pipermail/users-ironpython.com/2007-April/004773.html

It's interesting that CPython can make those gains still by using a
stack implementation.

I'll observe that IronPython has the additional dependency of the
full .NET runtime.  (It was my point 7/18 about incorporating the GNU
libs, that to compile to machine-native, as a JIT does, you need the
instruction set of the machine.)   Whereas, CPython can disregard
them, having already been compiled for it.

I think what I was looking for is that IronPython employs the .NET to
compile to machine instructions, once it's known what the values of
the variables are that are the operands.  The trade-off is compilation
time + type checks + stub look-up.

What I want to know is, if __add__ performs an attribute look-up, is
that optimized in any way, after the IP is already in compiled code?

After all that, I don't feel so guilty about stepping on Tim's toes.

On Jul 30, 12:12 am, Dino Viehland <[EMAIL PROTECTED]>
wrote:
> IronPython doesn't have an interpreter loop

RE: interpreter vs. compiled

2008-07-29 Thread Dino Viehland
IronPython doesn't have an interpreter loop and therefore has no POP / TOP / 
etc...   Instead what IronPython has is a method call Int32Ops.Add which looks 
like:

public static object Add(Int32 x, Int32 y) {
long result = (long) x + y;
if (Int32.MinValue <= result && result <= Int32.MaxValue) {
return 
Microsoft.Scripting.Runtime.RuntimeHelpers.Int32ToObject((Int32)(result));
}
return BigIntegerOps.Add((BigInteger)x, (BigInteger)y);
}

This is the implementation of int.__add__.  Note that calling int.__add__ can 
actually return NotImplemented and that's handled by the method binder looking 
at the strong typing defined on Add's signature here - and then automatically 
generating the NotImplemented result when the arguments aren't ints.  So that's 
why you don't see that here even though it's the full implementation of 
int.__add__.

Ok, next if you define a function like:

def adder(a, b):
return a + b

this turns into a .NET method, which will get JITed, which in C# would look 
something like like:

static object adder(object a, object b) {
return $addSite.Invoke(a, b)
}

where $addSite is a dynamically updated call site.

$addSite knows that it's performing addition and knows how to do nothing other 
than update the call site the 1st time it's invoked.  $addSite is local to the 
function so if you define another function doing addition it'll have its own 
site instance.

So the 1st thing the call site does is a call back into the IronPython runtime 
which starts looking at a & b to figure out what to do.  Python defines that as 
try __add__, maybe try __radd__, handle coercion, etc...  So we go looking 
through finding the __add__ method - if that can return NotImplemented then we 
find the __radd__ method, etc...  In this case we're just adding two integers 
and we know that the implementation of Add() won't return NotImplemented - so 
there's no need to call __radd__.  We know we don't have to worry about 
NotImplemented because the Add method doesn't have the .NET attribute 
indicating it can return NotImplemented.

At this point we need to do two things.  We need to generate the test which is 
going to see if future arguments are applicable to what we just figured out and 
then we need to generate the code which is actually going to handle this.  That 
gets combined together into the new call site delegate and it'll look something 
like:

static void CallSiteStub(CallSite site, object a, object b) {
if (a != null && a.GetType() == typeof(int) && b != null && b.GetType() 
== typeof(int)) {
return IntOps.Add((int)a, (int)b);
}
return site.UpdateBindingAndInvoke(a, b);
}

That gets compiled down as a lightweight dynamic method which also gets JITed.  
The next time through the call site's Invoke body will be this method and 
things will go really fast if we have int's again.  Also notice this is looking 
an awful lot like the inlined/fast-path(?) code dealing with int's that you 
quoted.  If everything was awesome (currently it's not for a couple of reasons) 
the JIT would even inline the IntOps.Add call and it'd probably be near 
identical.  And everything would be running native on the CPU.

So that's how 2 + 2 works...  Finally if it's a user type then we'd generate a 
more complicated test like (and getting more and more pseudo code to keep 
things simple):

if (PythonOps.CheckTypeVersion(a, 42) && PythonOps.CheckTypeVersion(b, 42)) {
return $callSite.Invoke(__cachedAddSlot__.__get__(a), b);
}

Here $callSite is another stub which will handle doing optimal dispatch to 
whatever __add__.__get__ will return.  It could be a Python type, it could be a 
user defined function, it could be the Python built-in sum function, etc...  so 
that's the reason for the extra dynamic dispatch.

So in summary: everything is compiled to IL.  At runtime we have lots of stubs 
all over the place which do the work to figure out the dynamic operation and 
then cache the result of that calculation.

Also what I've just described is how IronPython 2.0 works.  IronPython 1.0 is 
basically the same but mostly w/o the stubs and where we use stub methods 
they're much less sophisticated.

Also, IronPython is open source - www.codeplex.com/IronPython

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of castironpi
Sent: Tuesday, July 29, 2008 9:20 PM
To: python-list@python.org
Subject: Re: interpreter vs. compiled

On Jul 29, 7:39 am, alex23 <[EMAIL PROTECTED]> wrote:
> On Jul 29, 2:21 pm, castironpi <[EMAIL PROTECTED]> wrote:
>
> > On Jul 28, 5:58 pm, Fuzzyman <[EMAIL PROTECTED]> wrote:
> > > Well - in IronPython user code gets compiled to in memory assemblies
> > > which can be JIT'ed.
>
> > I don't believe so.
>
> Uh, you're questioning someone who is not only co-author of a book on
> IronPython, but also a developer on one of the first IronPython-based
> c

RE: Questions on 64 bit versions of Python

2008-07-25 Thread Dino Viehland
The end result of that is on a 32-bit machine IronPython runs in a 32-bit 
process and on a 64-bit machine it runs in a 64-bit process.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Mike Driscoll
Sent: Friday, July 25, 2008 5:58 AM
To: python-list@python.org
Subject: Re: Questions on 64 bit versions of Python

On Jul 25, 5:52 am, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
> M.-A. Lemburg wrote:
> >> 4. Is there a stable version of IronPython compiled under a 64 bit
> >> version of .NET? Anyone have experience with such a beast?
>
> > Can't comment on that one.
>
> Should that matter?  Isn't IronPython pure CLR?
>
> 

IronPython is written in C# and runs in/with the CLR, if that's what
you mean. Well, IronPython one works with the CLR and is equivalent to
Python 2.4, whereas IronPython 2 works with the DLR and is equivalent
to Python 2.5

Mike
--
http://mail.python.org/mailman/listinfo/python-list

--
http://mail.python.org/mailman/listinfo/python-list


trinity school defender

2008-06-03 Thread Dino Dragovic

u gorenavedenom flajeru u 8. redu:

"postoji više od 60.000 virusa i drugih štetnih programa "

samo virusa ima nekoliko stotina tisuca, zajedno sa potencijalno stetim 
aplikacijama i ostalim malicioznim kodom brojka ide preko milion

--
http://mail.python.org/mailman/listinfo/python-list


RE: Is there a way to use .NET DLL from Python

2008-02-12 Thread Dino Viehland
>>
>> Oh, I know what you mean.
>> But that was exactly the reason for having a .DLLs folder, isn't it?
>> When you place an assembly into this folder, you avoid having to write
>> this boilerplate code, and simply import the assembly as you would
>> with a normal python module. At least, that´s how it worked in
>> previous versions...
>No. You have always had to add references to assemblies before being
>able to use the namespaces they contain. You even have to do this with
>C# in Visual Studio.

This *should* work in both IronPython 1.x and IronPyton 2.0 - the catch though 
is that it's implemented in the default site.py we ship with.  So if you do the 
usual thing and use CPython's Lib directory you'll lose this feature w/o 
copying it over.
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Can IronPython work as Windows Scripting Host (WSH) language?

2007-06-28 Thread Dino Viehland
Currently IronPython doesn't support being hosted in WSH.  It's something we've 
discussed internally in the past but we've never had the cycles to make it work.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of misiek3d
Sent: Thursday, June 28, 2007 3:07 AM
To: python-list@python.org
Subject: Can IronPython work as Windows Scripting Host (WSH) language?

Hello
I want to use IronPython as Windows Scripting Host language. Is it
possible? How can I do it? I know that ActivePython works as WSH
language but for specific reasons I need to use IronPython.

regards
Michal

--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: ironpython exception line number

2007-06-28 Thread Dino Viehland
Given a file foo.py:

def f():

You should get these results:

IronPython 1.0.60816 on .NET 2.0.50727.312
Copyright (c) Microsoft Corporation. All rights reserved.
>>> try:
... execfile('foo.py')
... except IndentationError, e:
... import sys
... x = sys.exc_info()
...
>>> print x[1].filename, x[1].lineno, x[1].msg, x[1].offset, x[1].text, 
>>> x[1].args
foo.py 2 unexpected token  1  ('unexpected token ', ('foo.py', 2, 1, 
''))
>>>
>>>

Which is very similar to the result you get from CPython although we seem to 
disagree about what we expect next.

Python 2.5 (r25:51908, Sep 19 2006, 09:52:17) [MSC v.1310 32 bit (Intel)] on 
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> try:
... execfile('foo.py')
... except IndentationError, e:
... import sys
... x = sys.exc_info()
...
>>> print x[1].filename, x[1].lineno, x[1].msg, x[1].offset, x[1].text, 
>>> x[1].args
foo.py 2 expected an indented block 9  ('expected an indented block', 
('foo.py', 2, 9, ''))
>>> ^Z


If you're hosting IronPython and catching this from a .NET language then you'll 
be catching the .NET exception.  In that case you can access the original 
Python exception from ex.Data["PythonExceptionInfo"].  Alternately you could 
catch PythonSyntaxErrorException and access its properties (Line, Column, 
FileName, LineText, Severity, and ErrorCode).

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Troels Thomsen
Sent: Tuesday, June 26, 2007 1:33 PM
To: python-list@python.org
Subject: ironpython exception line number


Hello ,

When an exeption occurs in a IronPython executet script, and I print the
sys.exc , i get something ugly like the example below.
How can I get the fileName and line number?

Thx in advance
Troels


26-06-2007 13:19:04 : IronPython.Runtime.Exceptions.PythonIndentationError:
unexpected token def
   ved IronPython.Compiler.SimpleParserSink.AddError(String path, String
message, String lineText, CodeSpan span, Int32 errorCode, Severity severity)

   ved IronPython.Compiler.CompilerContext.AddError(String message, String
lineText, Int32 startLine, Int32 startColumn, Int32 endLine, Int32
endColumn, Int32 errorCode, Severity severity)

   ved IronPython.Compiler.Parser.ReportSyntaxError(Location start, Location
end, String message, Int32 errorCode)
   ved IronPython.Compiler.Parser.ReportSyntaxError(Token t, Int32
errorCode, Boolean allowIncomplete)
   ved IronPython.Compiler.Parser.ParseSuite()
   ved IronPython.Compiler.Parser.ParseFuncDef()
   ved IronPython.Compiler.Parser.ParseStmt()
   ved IronPython.Compiler.Parser.ParseSuite()
   ved IronPython.Compiler.Parser.ParseClassDef()
   ved IronPython.Compiler.Parser.ParseStmt()
   ved IronPython.Compiler.Parser.ParseFileInput()
   ved IronPython.Hosting.PythonEngine.Compile(Parser p, Boolean
debuggingPossible)
   ved IronPython.Hosting.PythonEngine.CompileFile(String fileName)
   ved IronPython.Hosting.PythonEngine.ExecuteFile(String fileName)



--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: IronPython 1.0 - Bugs or Features?

2006-09-06 Thread Dino Viehland
Yes, IronPython generates IL which the JIT will then compile when the method is 
invoked - so our parse/compile time is slower due to this.  We've experimented 
w/ a fully interpreted mode (which can be enabled with -X:FastEval) where we 
walk the generated AST instead of compiling it, but that mode doesn't 
necessarily pass all the tests (and would get worse performance for long 
running code).

There are other issues w/ startup time as well besides this though that we need 
to fix (for example we load all the types in mscorlib & System before we drop 
you into the interpreter, which is a lot of types to be loading...).  I suspect 
that for a small code snippet it's issues like these that are the most 
noticeable.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Super Spinner
Sent: Wednesday, September 06, 2006 4:03 PM
To: python-list@python.org
Subject: Re: IronPython 1.0 - Bugs or Features?


Claudio Grondi wrote:
> tjreedy wrote:
> > "Claudio Grondi" <[EMAIL PROTECTED]> wrote in message
> > news:[EMAIL PROTECTED]
> >
> >>I also erroneously assumed, that the first problem was detected
> >>during parsing ... so, by the way: how can I distinguish an error
> >>raised while parsing the code and an error raised when actually running the 
> >>code?
> >
> >
> > Parsing detects and reports syntax errors and maybe something else
> > if you use non-ascii chars without matching coding cookie.  Other
> > errors are runtime.
> Let's consider
>print '"Data   ê"'
>
> In CPython 2.4.2 there is in case of non-ascii character:
>sys:1: DeprecationWarning: Non-ASCII character '\xea' in file
> C:\IronPython-1.0-BugsOrFeatures.py on line 3, but no encoding
> declared; see http://www.python.org/peps/pep-0263.html for details
> "Data♀♂ Û"
>
> IronPython does not raise any warning and outputs:
> "Data♀♂ ?"
>
> So it seems, that IronPython is not that close to CPython as I have it
> expected.
> It takes much more time to run this above simple script in IronPython
> as in CPython - it feels as IronPython were extremely busy with
> starting itself.
>
> Claudio Grondi

IronPython is a .NET language, so does that mean that it invokes the JIT before 
running actual code?  If so, then "simple short scripts"
would take longer with IronPython "busy starting itself" loading .NET and 
invoking the JIT.  This effect would be less noticable, the longer the program 
is.  But I'm just guessing; I've not used IronPython.

--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list

RE: IronPython 1.0 - Bugs or Features?

2006-09-06 Thread Dino Viehland
Warnings is one of the features that didn't quite make it for v1.0.  In general 
w.r.t. non-ASCII characters you'll find IronPython to be more like Jython in 
that all strings are Unicode strings.  But other than that we do support 
PEP-263 for the purpose of defining alternate file encodings.

We're also aware of the startup time and will be working on reducing that in 
the future.  Thanks for the feedback!

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Claudio Grondi
Sent: Wednesday, September 06, 2006 1:47 PM
To: python-list@python.org
Subject: Re: IronPython 1.0 - Bugs or Features?

tjreedy wrote:
> "Claudio Grondi" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
>
>>I also erroneously assumed, that the first problem was detected during
>>parsing ... so, by the way: how can I distinguish an error raised
>>while parsing the code and an error raised when actually running the code?
>
>
> Parsing detects and reports syntax errors and maybe something else if
> you use non-ascii chars without matching coding cookie.  Other errors
> are runtime.
Let's consider
   print '"Data   ê"'

In CPython 2.4.2 there is in case of non-ascii character:
   sys:1: DeprecationWarning: Non-ASCII character '\xea' in file 
C:\IronPython-1.0-BugsOrFeatures.py on line 3, but no encoding declared; see 
http://www.python.org/peps/pep-0263.html for details "Data♀♂ Û"

IronPython does not raise any warning and outputs:
"Data♀♂ ?"

So it seems, that IronPython is not that close to CPython as I have it expected.
It takes much more time to run this above simple script in IronPython as in 
CPython - it feels as IronPython were extremely busy with starting itself.

Claudio Grondi
--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list

RE: Determining if an object is a class?

2006-07-12 Thread Dino Viehland
The first check is also off - it should if issubclass(type(Test), type): 
otherwise you miss the metaclass case:

class foo(type): pass

class Test(object):
__metaclass__ = foo

obj = Test
if type(obj) == type: 'class obj'
else: 'not a class'

just on the off-chance you run into a metaclass :)

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Clay Culver
Sent: Wednesday, July 12, 2006 2:07 PM
To: python-list@python.org
Subject: Re: Determining if an object is a class?

Ahh much better.  Thanks.

--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


RE: Undocumented alternate form for %#f ?

2006-04-28 Thread Dino Viehland
Ahh, cool...  Thanks for the explanation!

Do you want to help develop Dynamic languages on CLR? 
(http://members.microsoft.com/careers/search/details.aspx?JobID=6D4754DE-11F0-45DF-8B78-DC1B43134038)
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dave Hughes
Sent: Friday, April 28, 2006 1:00 PM
To: python-list@python.org
Subject: Re: Undocumented alternate form for %#f ?

Dino Viehland wrote:

> I'm assuming this is by-design, but it doesn't appear to be
> documented:
>
> >>> '%8.f' % (-1)
> '  -1'
> >>> '%#8.f' % (-1)
> ' -1.'
>
>
> The docs list the alternate forms, but there isn't one listed for
> f/F.   It would seem the alternate form for floating points is
> truncate & round the floating point value, but always display the .
> at the end.  Is that correct?

The Python % operator follows the C sprintf function pretty darn
closely in behaviour (hardly surprising really, though I've never
peeked at the implementation). Hence "man sprintf" can provide some
clues here. From man sprintf on my Linux box:

#
The  value  should be converted to an ``alternate form''.  For o
conversions, the first character of the output  string  is  made
zero (by prefixing a 0 if it was not zero already).  For x and X
conversions, a non-zero result has the string `0x' (or `0X'  for
X  conversions) prepended to it.  For a, A, e, E, f, F, g, and G
conversions, the result will always  contain  a  decimal  point,
even  if  no digits follow it (normally, a decimal point appears
in the results of those conversions only if  a  digit  follows).
For g and G conversions, trailing zeros are not removed from the
result as they would otherwise be.  For other  conversions,  the
result is undefined.

Hence, I don't think it's the # doing the truncating here, but it
certainly is producing the mandatory decimal point. If you get rid of
the "." in the specification, it uses the default decimal precision (6):

>>> "%8f" % (-1)
'-1.00'
>>> "%#8f" % (-1)
'-1.00'

No difference with the alternate specification here as the precision is
non-zero. Again, from man sprintf:

The precision
[snip]
If the precision is given as
just `.', or the precision is negative, the precision is  taken  to  be
zero.   This  gives the minimum number of digits to appear for d, i, o,
u, x, and X conversions, the number of digits to appear after the radix
character  for  a, A, e, E, f, and F conversions, the maximum number of
significant digits for g and G conversions, or the  maximum  number  of
characters to be printed from a string for s and S conversions.


HTH,

Dave.
--

--
http://mail.python.org/mailman/listinfo/python-list
-- 
http://mail.python.org/mailman/listinfo/python-list


Undocumented alternate form for %#f ?

2006-04-28 Thread Dino Viehland
I'm assuming this is by-design, but it doesn't appear to be documented:

>>> '%8.f' % (-1)
'  -1'
>>> '%#8.f' % (-1)
' -1.'


The docs list the alternate forms, but there isn't one listed for f/F.   It 
would seem the alternate form for floating points is truncate & round the 
floating point value, but always display the . at the end.  Is that correct?



Do you want to help develop Dynamic languages on CLR? 
(http://members.microsoft.com/careers/search/details.aspx?JobID=6D4754DE-11F0-45DF-8B78-DC1B43134038)

-- 
http://mail.python.org/mailman/listinfo/python-list