Re: tree representation of Python data

2023-02-08 Thread Shaozhong SHI
What is the robust way to use Python to read in an XML and turn it into a
JSON file?

JSON dictionary is actually a tree.  It is much easier to manage the
tree-structured data.

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Missing global # gdal DRIVER_NAME declaration in gdal_array.py

2022-03-08 Thread Shaozhong SHI
The following warning kept coming up when running ogr2ogr.

Warning 1: Missing global # gdal: DRIVER_NAME declaration in
C:\Users\AppData\Local\Programs\Python\Python36\Lib\site-packages\osgeo\gdal_array.py

What steps to be take to resolve this issue?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Issues of pip install gdal and fiona

2022-03-06 Thread Shaozhong SHI
I downloaded .whl files for fiona and gdal to go with Python3.6.5.

However, I am having trouble with red error messages.

Though Gdal is now working, there is a warning message - Missing global ~
gdal: DRIVER_NAME declaration   gdal_array,py

Can anyone advise on how to resolve the issues?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Re: Long running process - how to speed up?

2022-02-20 Thread Shaozhong SHI
On Sat, 19 Feb 2022 at 18:51, Alan Gauld  wrote:

> On 19/02/2022 11:28, Shaozhong SHI wrote:
>
> > I have a cvs file of 932956 row
>
> That's not a lot in modern computing terms.
>
> > and have to have time.sleep in a Python
> > script.
>
> Why? Is it a requirement by your customer? Your manager?
> time.sleep() is not usually helpful if you want to do
> things quickly.
>
> > It takes a long time to process.
>
> What is a "long time"? minutes? hours? days? weeks?
>
> It should take a million times as long as it takes to
> process one row. But you have given no clue what you
> are doing in each row.
> - reading a database?
> - reading from the network? or the internet?
> - writing to a database? or the internet?
> - performing highly complex math operations?
>
> Or perhaps the processing load is in analyzing the totality
> of the data after reading it all? A very different type
> of problem. But we just don't know.
>
> All of these factors will affect performance.
>
> > How can I speed up the processing?
>
> It all depends on the processing.
> You could try profiling your code to see where the time is spent.
>
> > Can I do multi-processing?
>
> Of course. But there is no guarantee that will speed things
> up if there is a bottleneck on a single resource somewhere.
> But it might be possible to divide and conquer and get better
> speed. It all depends on what you are doing. We can't tell.
>
> We cannot answer such a vague question with any specific
> solution.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
> --
> https://mail.python.org/mailman/listinfo/python-list


Do not know these answers yet.  Now, it appeared to hang/stop at a point
and does not move on.

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-20 Thread Shaozhong SHI
On Sat, 19 Feb 2022 at 19:44, Mats Wichmann  wrote:

> On 2/19/22 05:09, Shaozhong SHI wrote:
> > Can it be divided into several processes?
> > Regards,
> > David
>
> The answer is: "maybe".  Multiprocessing doesn't happen for free, you
> have to figure out how to divide the task up, requiring thought and
> effort. We can't guess to what extent the problem you have is amenable
> to multiprocessing.
>
> Google for "dataframe" and "multiprocessing" and you should get some
> hits (in my somewhat limited experience in this area, people usually
> load the csv data into Pandas before they get started working with it).
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list


I am trying this approach,

import multiprocessing as mp

def my_func(x):
  print(x**x)

def main():
  pool = mp.Pool(mp.cpu_count())
  result = pool.map(my_func, [4,2,3])

if __name__ == "__main__":
  main()

I modified the script and set off a test run.

However, I have no idea whether this approach will be faster than
conventional approach.

Any one has idea?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Long running process - how to speed up?

2022-02-19 Thread Shaozhong SHI
Can it be divided into several processes?
Regards,
David

On Saturday, 19 February 2022, Chris Angelico  wrote:

> On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert 
> wrote:
> >
> > > > I have a cvs file of 932956 row and have to have time.sleep in a
> Python
> > > > script.  It takes a long time to process.
> > > >
> > > > How can I speed up the processing?  Can I do multi-processing?
> > > >
> > > Remove the time.sleep()?
> >
> > He's attesting to only having "time.sleep" in there...
> >
> > I doubt removing that will help much ;-)
>
> I honestly don't understand the question, hence offering the
> stupidly-obvious suggestion in the hope that it would result in a
> better question. A million rows of CSV, on its own, isn't all that
> much to process, so it must be the processing itself (of which we have
> no information other than this reference to time.sleep) that takes all
> the time.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Long running process - how to speed up?

2022-02-19 Thread Shaozhong SHI
I have a cvs file of 932956 row and have to have time.sleep in a Python
script.  It takes a long time to process.

How can I speed up the processing?  Can I do multi-processing?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


URLError:

2022-02-12 Thread Shaozhong SHI
The following is used in a loop to get response code for each url.

print (urllib.request.urlopen(url).getcode())

However, error message says: URLError: 

Python 3.6.5 is being used to test whether url is live or not.

Can anyone shed light on this?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to set environmental variables for Python

2022-01-17 Thread Shaozhong SHI
Set Operation System but not disturbing existing setting.  Only to add at
the command line.

Regards,

David

On Mon, 17 Jan 2022 at 10:57, dn via Python-list 
wrote:

> On 17/01/2022 22.31, Shaozhong SHI wrote:
> > I got quite a few version of Python on my machine.
> >
> > How do I set environmental variables for Python 3.6.1 to work?
>
>
> Set from Python, or set in the OpSys?
>
> https://docs.python.org/3/library/os.html?highlight=environment%20variable
>
> MS-Win: https://docs.python.org/3/using/windows.html#setting-envvars
> --
> Regards,
> =dn
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


How to set environmental variables for Python

2022-01-17 Thread Shaozhong SHI
I got quite a few version of Python on my machine.

How do I set environmental variables for Python 3.6.1 to work?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Can Python call and use FME modules and functions such as StreamOrderCalculator?

2021-12-23 Thread Shaozhong SHI
Can we do something like import an fme.something and make use of FME
modules and functions?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


ogr2ogr can not open gfs file when loading GML

2021-12-14 Thread Shaozhong SHI
My command line kept telling me that it ogr2ogr can not open gfs file.  It
does find it.

I was trying to load GML onto PostGIS.

Alternatively, how to specify XSD file to go along with reading GML files?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Installation of GeoPandas - failed at fiona

2021-12-01 Thread Shaozhong SHI
I am trying to install geopandas.

I navigated to c:\programData|Anaconda3\Scripts>
and typed in 'pip install geopandas'.

It ran but failed at fiona.

I tried import geopandas as gp, but Error Message says: No module names
'geopandas'.

Can anyone help?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: System, configuration and Python performance

2021-11-01 Thread Shaozhong SHI
On Tue, 2 Nov 2021 at 00:20, Shaozhong SHI  wrote:

>
>
> On Tue, 2 Nov 2021 at 00:09, MRAB  wrote:
>
>> On 2021-11-01 23:02, Shaozhong SHI wrote:
>> > How to configure to improve Python performance in a system like the
>> > following:
>> >
>> > Windows 10
>> >
>> > System
>> >
>> > Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz
>> > Installed memory (RAM) 32.0 GB (31.8 GB usable)
>> > System type: 64-bit Operating System, x64-based processor
>> >
>> > I found that the Python script was runnig slowly and wanted to find out
>> > what is going on and what activities it is doing.
>> >
>> > I opened the Task Manager and found that there is not much CPU usage.
>> >
>> > Do I need to do something like configuration to improve Python's
>> > performance?
>> >
>> If CPU usage is low, then that isn't the cause of the slowness.
>>
>> What about disk usage?
>>
>> What about network usage?
>>
>> If it's communicating across the internet, then it might be waiting for
>> the other end. If that's the case, then there's probably not much you
>> can do about it.
>>
>
> Both disk usage and network usage are very low as well.
>
> It is checking out responses of internet pages with given URLs.
>
> It is checking out whether each url is valid or not.
>
>
>
>> --
>> https://mail.python.org/mailman/listinfo/python-list
>>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: System, configuration and Python performance

2021-11-01 Thread Shaozhong SHI
On Tue, 2 Nov 2021 at 00:09, MRAB  wrote:

> On 2021-11-01 23:02, Shaozhong SHI wrote:
> > How to configure to improve Python performance in a system like the
> > following:
> >
> > Windows 10
> >
> > System
> >
> > Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz
> > Installed memory (RAM) 32.0 GB (31.8 GB usable)
> > System type: 64-bit Operating System, x64-based processor
> >
> > I found that the Python script was runnig slowly and wanted to find out
> > what is going on and what activities it is doing.
> >
> > I opened the Task Manager and found that there is not much CPU usage.
> >
> > Do I need to do something like configuration to improve Python's
> > performance?
> >
> If CPU usage is low, then that isn't the cause of the slowness.
>
> What about disk usage?
>
> What about network usage?
>
> If it's communicating across the internet, then it might be waiting for
> the other end. If that's the case, then there's probably not much you
> can do about it.
>

Both disk usage and network usage are very low as well.

It is checking out responses of internet pages with given URLs.

Regards,

David




> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


System, configuration and Python performance

2021-11-01 Thread Shaozhong SHI
How to configure to improve Python performance in a system like the
following:

Windows 10

System

Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz
Installed memory (RAM) 32.0 GB (31.8 GB usable)
System type: 64-bit Operating System, x64-based processor

I found that the Python script was runnig slowly and wanted to find out
what is going on and what activities it is doing.

I opened the Task Manager and found that there is not much CPU usage.

Do I need to do something like configuration to improve Python's
performance?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to apply a self defined function in Pandas

2021-10-31 Thread Shaozhong SHI
On Sun, 31 Oct 2021 at 18:42, Shaozhong SHI  wrote:

>
>
> On Sunday, 31 October 2021, Albert-Jan Roskam 
> wrote:
>
>>
>>
>> > df['URL'] = df.apply(lambda x:  connect(df['URL']), axis=1)
>>
>>
>> I think you need axis=0. Or use the Series, df['URL'] =
>> df.URL.apply(connect)
>>
>

> Just experimented with your suggestion, but have not seen any difference.
>

Regards, David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to apply a self defined function in Pandas

2021-10-31 Thread Shaozhong SHI
On Sun, 31 Oct 2021 at 19:28, MRAB  wrote:

> On 2021-10-31 18:48, Shaozhong SHI wrote:
> >
> > On Sunday, 31 October 2021, MRAB  wrote:
> >
> > On 2021-10-31 17:25, Shaozhong SHI wrote:
> >
> > I defined a function and apply it to a column in Pandas.  But
> > it does not
> > return correct values.
> >
> > I am trying to test which url in a column full of url to see
> > which one can
> > be connected to or not
> >
> > def connect(url):
> >  try:
> >  urllib.request.urlopen(url)
> >  return True
> >  except:
> >  return False
> >
> > df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1)
> >
> > I ran without any error, but did not return any true.
> >
> > I just could not find any error with it.
> >
> > Can anyone try and find out why
> >
> > You're passing a function to '.apply'. That has one argument,' x'.
> >
> > But what is the function doing with that argument?
> >
> > Nothing.
> >
> > The function is just returning the result of connect(df['URL']).
> >
> > df['URL'] is a column, so you're passing a column to '.urlopen',
> > which, of course, it doesn't understand.
> >
> > So 'connect' returns False.
> >
> >
> > Please expand on how.
> >
> It's as simple as passing 'connect' to '.apply' as  the first argument.
>


Well, can you expand the the simplicity?

Regards, David

> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to apply a self defined function in Pandas

2021-10-31 Thread Shaozhong SHI
On Sunday, 31 October 2021, MRAB  wrote:

> On 2021-10-31 17:25, Shaozhong SHI wrote:
>
>> I defined a function and apply it to a column in Pandas.  But it does not
>> return correct values.
>>
>> I am trying to test which url in a column full of url to see which one can
>> be connected to or not
>>
>> def connect(url):
>>  try:
>>  urllib.request.urlopen(url)
>>  return True
>>  except:
>>  return False
>>
>> df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1)
>>
>> I ran without any error, but did not return any true.
>>
>> I just could not find any error with it.
>>
>> Can anyone try and find out why
>>
>> You're passing a function to '.apply'. That has one argument,' x'.
>
> But what is the function doing with that argument?
>
> Nothing.
>
> The function is just returning the result of connect(df['URL']).
>
> df['URL'] is a column, so you're passing a column to '.urlopen', which, of
> course, it doesn't understand.
>
> So 'connect' returns False.
>
>
Please expand on how.

David

> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to apply a self defined function in Pandas

2021-10-31 Thread Shaozhong SHI
On Sunday, 31 October 2021, Albert-Jan Roskam 
wrote:

>
>
> > df['URL'] = df.apply(lambda x:  connect(df['URL']), axis=1)
>
>
> I think you need axis=0. Or use the Series, df['URL'] =
> df.URL.apply(connect)
>
Any details?
I will try and let you know.  Regards, David
-- 
https://mail.python.org/mailman/listinfo/python-list


How to apply a self defined function in Pandas

2021-10-31 Thread Shaozhong SHI
I defined a function and apply it to a column in Pandas.  But it does not
return correct values.

I am trying to test which url in a column full of url to see which one can
be connected to or not

def connect(url):
try:
urllib.request.urlopen(url)
return True
except:
return False

df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1)

I ran without any error, but did not return any true.

I just could not find any error with it.

Can anyone try and find out why


Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Python script seems to stop running when handling very large dataset

2021-10-30 Thread Shaozhong SHI
On Saturday, 30 October 2021, Dieter Maurer  wrote:

> Shaozhong SHI wrote at 2021-10-29 23:42 +0100:
> >Python script works well, but seems to stop running at a certain point
> when
> >handling very large dataset.
> >
> >Can anyone shed light on this?
>
> Some algorithms have non linear runtime.
>
>
> For example, it is quite easy to write code with
> quadratic runtime in Python:
>   s = ""
>   for x in ...: s += f(x)
> You will see the problem only for large data sets.
>

Has anyone compared this with iterrow?  which looping option is faster?
Regards, David
-- 
https://mail.python.org/mailman/listinfo/python-list


Python script seems to stop running when handling very large dataset

2021-10-29 Thread Shaozhong SHI
Python script works well, but seems to stop running at a certain point when
handling very large dataset.

Can anyone shed light on this?

Regards, David
-- 
https://mail.python.org/mailman/listinfo/python-list


How to store the result of df.count() as a new dataframe in Pandas?

2021-10-26 Thread Shaozhong SHI
Hello,

The result of df.count() appears to be a series object.  How to store the
result of df.count() as a new dataframe in Pandas?

That is data anyhow.

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


df.count() to a Pandas dataframe with column names

2021-10-21 Thread Shaozhong SHI
How to output the result of df.count() to a Pandas dataframe with column
names?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Alternatives to Jupyter Notebook

2021-10-20 Thread Shaozhong SHI
Hello,

Is anyone familiar with alternatives to Jupyter Notebook.

My Jupyter notebook becomes unresponsive in browsers.

Are there alternatives to read, edit and run Jupyter Notebook?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


SQLAlchemy fault

2021-10-20 Thread Shaozhong SHI
I read a txt file into Pandas Dataframe, and found a lot of nulls in a
column.

Then, I used SQLAlchemy and psycopg2.  I created engine.

I loaded data onto PostgreSQL.

Strange thing happened.

The column has no null at all.

Does it mean that the data has been modified somewhere along the line?

Does anyone know the robust and fast loading of Pandas frame data onto
Postgres database?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Connecting to MS accdb and read data into Pandas

2021-10-12 Thread Shaozhong SHI
I tried the following code:
import pyodbc

conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb,
*.accdb)};DBQ=D:\my.accdb;')
cursor = conn.cursor()
cursor.execute('select * from table_name')

for row in cursor.fetchall():
print (row)


But I could not connect to .accdb.

What is the robust way to set the connection string?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Definitive guide for Regex

2021-10-01 Thread Shaozhong SHI
Hi, Barry,

In cases of automating checking, validation and producing reports in the
context of data quality control and giving specific feedback to production
teams, regex is perhaps the only way.

Perhaps, we can give each element of data specifications a name, that are
associated with a regex value, so that we can automate checking and
reporting on data sets.  We can report on which row of records meet
specification and requirements and which one is not.  And, report on which
cell needs to be corrected should a row is found not meeting specification
and requirements.

What do you think?

Regards,

David

On Thu, 30 Sept 2021 at 22:02, Barry Scott  wrote:

>
>
> > On 30 Sep 2021, at 19:35, dn via Python-list 
> wrote:
> >
> > On 01/10/2021 06.16, Barry Scott wrote:
> >>
> >>
> >>> On 30 Sep 2021, at 12:29, Shaozhong SHI 
> wrote:
> >>>
> >>> Dear All,
> >>>
> >>> I am trying to look for a definitive guide for Regex in Python.
> >>> Can anyone help?
> >>
> >> Have you read the python docs for the re module?
> >
> >
> > I learned from Jeffrey Friedl's book "Mastering Regular Expressions",
> > but that was in a land far away, last century, and under a different
> > language (and the original version - I see it's now up to its third
> > edition).
> >
> > Despite their concise exercise of power (and the fact that in my
> > Python-life I've never been put into a corner where I absolutely must
> > use one), I'm no longer a fan...
>
> Agreed, regex is the last tool I reach for in python code.
> I find I use split() a lot to break up strings for processing.
> But there are cases where a regex is the best tool for a particular job
> and I then use the re module. But it costs in maintainability.
>
> I speak as the author of a regex engine and know how to write scary
> regex's when the need arises.
>
> Barry
>
>
> > --
> > Regards,
> > =dn
> > --
> > https://mail.python.org/mailman/listinfo/python-list
> >
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Definitive guide for Regex

2021-09-30 Thread Shaozhong SHI
Dear All,

I am trying to look for a definitive guide for Regex in Python.
Can anyone help?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Automated data testing, checking, validation, reporting for data assurance

2021-09-29 Thread Shaozhong SHI
There appear to be a few options for this.

Has anyone tested and got experience with automated data testing,
validation and reporting?

Can anyone enlighten me?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Observing long running processes of Jupyter Notebook

2020-12-03 Thread Shaozhong SHI
We have been running Jupyter Notebook processes, which take long time to
run.

We use nbconvert to run these in commandline.  Nbconvert only writes output
into a file at the end.

We just wonder whether there is a way to observe the progress and printing
messages when nbconvert is running.

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


How to run Jupyter notebook in command line and get full error message?

2020-11-28 Thread Shaozhong SHI
How to run Jupyter notebook in command line and get full error messages?

My VPN keeps dropping and can not run Jupyter Notebook as it is.

I started to use nbconvert in command line.

But, when it stops due to error, I can not see where the error occurs.

In order to make life easier for debugging, what is the best practice?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


How to record full error message for debugging with nbconvert?

2020-11-28 Thread Shaozhong SHI
Hi,
When I use nbconvert to run Jupyter notebook, it is so difficult to see the
full error message for debugging?

How to save full error messages?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


ssl connection has been closed unexpectedly

2020-11-28 Thread Shaozhong SHI
Hi,

I keep getting the following error when I use engine =
create_engine(logging in details to postgres)
df.to_sql('table_name', and etc.)


OperationalError: (psycopg2.OperationalError) SSL connection has been
closed unexpectedly

(Background on this error at: http://sqlalche.me/e/13/e3q8)
OperationalError: (psycopg2.OperationalError) SSL connection has been
closed unexpectedly

Can anyone shed any light on this?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Questions about XML processing?

2020-11-07 Thread Shaozhong SHI
Hi, Hernan,

Did you try to parse GML?

Surely, there can be very concise and smart ways to do these things.

Regards,

David

On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis 
wrote:

> Thank you Terry, Dan and Dieter for encouraging me to post here. I have
> already solved the problem albeit with a not so efficient solution.
> Perhaps, it is useful to present it here anyway in case some light can
> be added to this.
>
> My job is to parse a complicated XML (iso metadata) and pick up values
> of certain fields in certain conditions. This goes for the most part
> well. I am working with xml.etree.elementtree, which proved sufficient
> for the most part and the rest of the project. JSON is not an option
> within this project.
>
> The specific trouble was in this section, itself the child of a more
> complicated parent: (for simplicity tags are renamed and namespaces
> removed)
>
>
>  
>
>  Something
>
>
>  Something else
>
>
>  
>
>  value
>
>
>  
>
> 2020-11-06
>
>
>  
>
>  
>
>  
>
>  
>
>
> Basically, I have to get what is in tagC/string but only if the value of
> tagC/note/title/string is "value". As you see, there are several tagC,
> all children of tagB, but tagC can have different meanings(!). And no, I
> have no control over how these XML fields are constructed.
>
> In principle it is easy to make a "findall" and get strings for tagC,
> using:
>
> elem.findall("./tagA/tagB/tagC/string")
>
> and then get the content and append in case there is more than one
> tagC/string like: "Something, Something else".
>
> However, the hard thing to do here is to get those only when
> tagC/note/title/string='value'. I was expecting to find a way of
> specifying a certain construction in square brackets, like
> [@string='value'] or [@/tagC/note/title/string='value'], as is usual in
> XML and possible in xml.etree. However this proved difficult (at least
> for me). So this is the "brute" solution I implemented:
>
> - find all children of tagA/tagB
> - check if /tagA/tagB/tagC/note/title/string has "value"
> - if yes find all tagA/tagB/tagC/string
>
> In quasi-Python:
>
> string = []
> element0 = elem.findall("./tagA/tagB/")
>  for element1 in element0:
>  element2 = element1.find("./tagA/tagB/tagC/note/title/string")
>  if element2.text == 'value'
>  element3 = element1.findall("./tagA/tagB/tagC/string)
>  for element4 in element3:
>  string.append(element4.text)
>
>
> Crude, but works. As I wrote above, I was wishing that a bracketed
> clause of the type [@ ...] already in the first "findall" would do a
> more efficient job but alas my knowledge of xml is too rudimentary.
> Perhaps something to tinker on in the coming weeks.
>
> Have a nice weekend!
>
>
>
>
>
> On 2020-11-06 20:10, Terry Reedy wrote:
> > On 11/6/2020 11:17 AM, Hernán De Angelis wrote:
> >> I am confronting some XML parsing challenges and would like to ask
> >> some questions to more knowledgeable Python users. Apparently there
> >> exists a group for such questions but that list (xml-sig) has
> >> apparently not received (or archived) posts since May 2018(!). I
> >> wonder if there are other list or forum for Python XML questions, or
> >> if this list would be fine for that.
> >
> > If you don't hear otherwise, try here.  Or try stackoverflow.com and
> > tag questions with python and xml.
> >
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Dataframe to postgresql - Saving the dataframe to memory using StringIO

2020-10-22 Thread Shaozhong SHI
I found this last option is very interesting.

Saving the dataframe to memory using StringIO

https://naysan.ca/2020/06/21/pandas-to-postgresql-using-psycopg2-copy_from/

But, testing shows
unicode argument expected, got 'str'

Any working example for getting DataFrame into a PostgreSQL table directly?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


How to write differently to remove this type hint in Python 2.7?

2020-10-21 Thread Shaozhong SHI
Is there another way to do this?

def greet(name: str) -> str:
return "Hello, " + name
greet

File "", line 1 def greet(name: str) -> str:
^ SyntaxError: invalid syntax
-- 
https://mail.python.org/mailman/listinfo/python-list


How to expand and flatten a nested of list of dictionaries of varied lengths?

2020-10-18 Thread Shaozhong SHI
Even worse is that, in some cases, an addition called serviceRatings as a
key occur with new data unexpectedly.

How to produce a robust Python/Panda script to coping with all these?

Regards,

David

u'historicRatings': [{u'overall': {u'keyQuestionRatings': [{u'name':
u'Safe', u'rating': u'Requires improvement'}, {u'name': u'Well-led',
u'rating': u'Requires improvement'}], u'rating': u'Requires improvement'},
u'reportDate': u'2019-10-04', u'reportLinkId':
u'63ff05ec-4d31-406e-83de-49a271cfdc43'}, {u'overall':
{u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Good'}, {u'name':
u'Well-led', u'rating': u'Good'}, {u'name': u'Caring', u'rating': u'Good'},
{u'name': u'Responsive', u'rating': u'Good'}, {u'name': u'Effective',
u'rating': u'Requires improvement'}], u'rating': u'Good'}, u'reportDate':
u'2017-09-08', u'reportLinkId': u'4f20da40-89a4-4c45-a7f9-bfd52b48f286'},
{u'overall': {u'keyQuestionRatings': [{u'name': u'Safe', u'rating':
u'Requires improvement'}, {u'name': u'Well-led', u'rating': u'Requires
improvement'}, {u'name': u'Caring', u'rating': u'Requires improvement'},
{u'name': u'Responsive', u'rating': u'Requires improvement'}, {u'name':
u'Effective', u'rating': u'Good'}], u'rating': u'Requires improvement'},
u'reportDate': u'2016-06-11', u'reportLinkId':
u'0cc4226b-401e-4f0f-ba35-062cbadffa8f'}, {u'overall':
{u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Good'}, {u'name':
u'Well-led', u'rating': u'Good'}, {u'name': u'Caring', u'rating': u'Good'},
{u'name': u'Responsive', u'rating': u'Requires improvement'}, {u'name':
u'Effective', u'rating': u'Good'}], u'rating': u'Good'}, u'reportDate':
u'2015-01-12', u'reportLinkId': u'a11c1e52-ddfd-4cd8-8b56-1b96ac287c96'}]
-- 
https://mail.python.org/mailman/listinfo/python-list


Are there Python ways to execute queries on PostgreSQL without getting data over?

2020-10-18 Thread Shaozhong SHI
Are there Python ways to execute queries on PostgreSQL without getting data
over?

Are there ways just to fire off PostgreSQL queries and not get data into
Python?

Regards,

David
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: ValueError: arrays must all be same length

2020-10-05 Thread Shaozhong SHI
Hi, I managed to flatten it with json_normalize first.

from pandas.io.json import json_normalize
atable = json_normalize(d)
atable

Then, I got this table.

brandId brandName careHome constituency
currentRatings.overall.keyQuestionRatings currentRatings.overall.rating
currentRatings.overall.reportDate currentRatings.overall.reportLinkId
currentRatings.reportDate dormancy ... providerId region registrationDate
registrationStatus regulatedActivities relationships reports specialisms
type uprn
0 BD510 BRAND MACC Care Y Birmingham, Northfield [{u'reportDate':
u'2020-10-01', u'rating': u'R... Requires improvement 2020-10-01
1157c975-c2f1-423e-a2b4-66901779e014 2020-10-01 N ... 1-101641521 West
Midlands 2013-12-16 Registered [{u'code': u'RA2', u'name': u'Accommodation

Then, I tried to expand the column
of currentRatings.overall.keyQuestionRatings, with

mydf =
pd.DataFrame.from_dict(atable['currentRatings.overall.keyQuestionRatings'][0])
mydf

Then, I got another table.

name rating reportDate reportLinkId
0 Safe Requires improvement 2020-10-01 1157c975-c2f1-423e-a2b4-66901779e014
1 Well-led Requires improvement 2020-10-01
1157c975-c2f1-423e-a2b4-66901779e014
2 Caring Good 2019-10-04 63ff05ec-4d31-406e-83de-49a271cfdc43
3 Responsive Good 2019-10-04 63ff05ec-4d31-406e-83de-49a271cfdc43
4 Effective Requires improvement 2019-10-04
63ff05ec-4d31-406e-83de-49a271cfdc43


How can I re-arrange to get a flatten table?

Apparently, the nested data is another table.

Regards,

Shao



On Sun, 4 Oct 2020 at 13:55, Tim Williams  wrote:

> On Sun, Oct 4, 2020 at 8:39 AM Tim Williams  wrote:
>
> >
> >
> > On Fri, Oct 2, 2020 at 11:00 AM Shaozhong SHI 
> > wrote:
> >
> >> Hello,
> >>
> >> I got a json response from an API and tried to use pandas to put data
> into
> >> a dataframe.
> >>
> >> However, I kept getting this ValueError: arrays must all be same length.
> >>
> >> Can anyone help?
> >>
> >> The following is the json text.  Regards, Shao
> >>
> >> (snip json_text)
> >
> >
> >> import pandas as pd
> >>
> >> import json
> >>
> >> j = json.JSONDecoder().decode(req.text)  ###req.json
> >>
> >> df = pd.DataFrame.from_dict(j)
> >>
> >
> > I copied json_text into a Jupyter notebook and got the same error trying
> > to convert this into a pandas DataFrame:When I tried to copy this into a
> > string, I got an error,, but without enclosing the paste in quotes, I got
> > the dictionary.
> >
> >
> (delete long response output)
>
>
> > for k in json_text.keys():
> > if isinstance(json_text[k], list):
> > print(k, len(json_text[k]))
> >
> > relationships 0
> > locationTypes 0
> > regulatedActivities 2
> > gacServiceTypes 1
> > inspectionCategories 1
> > specialisms 4
> > inspectionAreas 0
> > historicRatings 4
> > reports 5
> >
> > HTH,.
> >
> >
> This may also be more of a pandas issue.
>
> json.loads(json.dumps(json_text))
>
> has a successful round-trip
>
>
> > --
> >> https://mail.python.org/mailman/listinfo/python-list
> >>
> >
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


How to handle a dictionary value that is a list

2020-10-02 Thread Shaozhong SHI
Hi, All,

I was trying to handle the value of  "personRoles" in a part of json
dictionary.

Can anyone tell me various ways to handle this?

Regards,

Shao

"regulatedActivities": [
{
  "name": "Accommodation for persons who require nursing or personal
care",
  "code": "RA2",
  "contacts": [
{
  "personTitle": "Mr",
  "personGivenName": "Steven",
  "personFamilyName": "Great",
  "personRoles": [
"Registered Manager"
  ]

e,f = [],[]
for result in d['regulatedActivities']:
e.append(result['name'])
    for s in result['contacts']['personRoles']:
    t = (list)
  print s ###f.append(s)
f = d['regulatedActivities']['contacts']['personRoles']
df1 = pd.DataFrame([e,f]).T
-- 
https://mail.python.org/mailman/listinfo/python-list


ValueError: arrays must all be same length

2020-10-02 Thread Shaozhong SHI
Hello,

I got a json response from an API and tried to use pandas to put data into
a dataframe.

However, I kept getting this ValueError: arrays must all be same length.

Can anyone help?

The following is the json text.  Regards, Shao

{
  "locationId": "1-1004508435",
  "providerId": "1-101641521",
  "organisationType": "Location",
  "type": "Social Care Org",
  "name": "Meadow Rose Nursing Home",
  "brandId": "BD510",
  "brandName": "BRAND MACC Care",
  "onspdCcgCode": "E38000220",
  "onspdCcgName": "NHS Birmingham and Solihull CCG",
  "odsCode": "VM4G9",
  "uprn": "100070537642",
  "registrationStatus": "Registered",
  "registrationDate": "2013-12-16",
  "dormancy": "N",
  "numberOfBeds": 56,
  "postalAddressLine1": "96 The Roundabout",
  "postalAddressTownCity": "Birmingham",
  "postalAddressCounty": "West Midlands",
  "region": "West Midlands",
  "postalCode": "B31 2TX",
  "onspdLatitude": 52.399843,
  "onspdLongitude": -1.989241,
  "careHome": "Y",
  "inspectionDirectorate": "Adult social care",
  "mainPhoneNumber": "01214769808",
  "constituency": "Birmingham, Northfield",
  "localAuthority": "Birmingham",
  "lastInspection": {
"date": "2020-06-24"
  },
  "lastReport": {
"publicationDate": "2020-10-01"
  },
  "relationships": [

  ],
  "locationTypes": [

  ],
  "regulatedActivities": [
{
  "name": "Accommodation for persons who require nursing or personal care",
  "code": "RA2",
  "contacts": [
{
  "personTitle": "Mr",
  "personGivenName": "Steven",
  "personFamilyName": "Kazembe",
  "personRoles": [
"Registered Manager"
  ]
}
  ]
},
{
  "name": "Treatment of disease, disorder or injury",
  "code": "RA5",
  "contacts": [
{
  "personTitle": "Mr",
  "personGivenName": "Steven",
  "personFamilyName": "Kazembe",
  "personRoles": [
"Registered Manager"
  ]
}
  ]
}
  ],
  "gacServiceTypes": [
{
  "name": "Nursing homes",
  "description": "Care home service with nursing"
}
  ],
  "inspectionCategories": [
{
  "code": "S1",
  "primary": "true",
  "name": "Residential social care"
}
  ],
  "specialisms": [
{
  "name": "Caring for adults over 65 yrs"
},
{
  "name": "Caring for adults under 65 yrs"
},
{
  "name": "Dementia"
},
{
  "name": "Physical disabilities"
}
  ],
  "inspectionAreas": [

  ],
  "currentRatings": {
"overall": {
  "rating": "Requires improvement",
  "reportDate": "2020-10-01",
  "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014",
  "useOfResources": {

  },
  "keyQuestionRatings": [
{
  "name": "Safe",
  "rating": "Requires improvement",
  "reportDate": "2020-10-01",
  "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014"
},
{
  "name": "Well-led",
  "rating": "Requires improvement",
  "reportDate": "2020-10-01",
  "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014"
},
{
  "name": "Caring",
  "rating": "Good",
  "reportDate": "2019-10-04",
  "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43"
},
{
  "name": "Responsive",
  "rating": "Good",
  "reportDate": "2019-10-04",
  "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43"
},
{
  "name": "Effective",
  "rating": "Requires improvement",
  "reportDate": "2019-10-04",
  "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43"
}
  ]
},
"reportDate": "2020-10-01"
  },
  "historicRatings": [
{
  "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43",
  "reportDate": "2019-10-04",
  "overall": {
"rating": "Requires improvement",
"keyQuestionRatings": [
  {
"name": "Safe",
"rating": "Requires improvement"
  },
  {
"name": "Well-led",
"rating": "Requires improvement"
  }
]
  }
},
{
  "reportLinkId": "4f20da40-89a4-4c45-a7f9-bfd52b48f286",
  "reportDate": "2017-09-08",
  "overall": {
"rating": "Good",
"keyQuestionRatings": [
  {
"name": "Safe",
"rating": "Good"
  },
  {
"name": "Well-led",
"rating": "Good"
  },
  {
"name": "Caring",
"rating": "Good"
  },
  {
"name": "Responsive",
"rating": "Good"
  },
  {
"name": "Effective",
"rating": "Requires improvement"
  }
]
  }
},
{
  "reportLinkId": "0cc4226b-401e-4f0f-ba35-062cbadffa8f",
  "reportDate": "2016-06-11",
  "overall": {
"rating": "Requires