Re: tree representation of Python data
What is the robust way to use Python to read in an XML and turn it into a JSON file? JSON dictionary is actually a tree. It is much easier to manage the tree-structured data. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Missing global # gdal DRIVER_NAME declaration in gdal_array.py
The following warning kept coming up when running ogr2ogr. Warning 1: Missing global # gdal: DRIVER_NAME declaration in C:\Users\AppData\Local\Programs\Python\Python36\Lib\site-packages\osgeo\gdal_array.py What steps to be take to resolve this issue? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Issues of pip install gdal and fiona
I downloaded .whl files for fiona and gdal to go with Python3.6.5. However, I am having trouble with red error messages. Though Gdal is now working, there is a warning message - Missing global ~ gdal: DRIVER_NAME declaration gdal_array,py Can anyone advise on how to resolve the issues? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 18:51, Alan Gauld wrote: > On 19/02/2022 11:28, Shaozhong SHI wrote: > > > I have a cvs file of 932956 row > > That's not a lot in modern computing terms. > > > and have to have time.sleep in a Python > > script. > > Why? Is it a requirement by your customer? Your manager? > time.sleep() is not usually helpful if you want to do > things quickly. > > > It takes a long time to process. > > What is a "long time"? minutes? hours? days? weeks? > > It should take a million times as long as it takes to > process one row. But you have given no clue what you > are doing in each row. > - reading a database? > - reading from the network? or the internet? > - writing to a database? or the internet? > - performing highly complex math operations? > > Or perhaps the processing load is in analyzing the totality > of the data after reading it all? A very different type > of problem. But we just don't know. > > All of these factors will affect performance. > > > How can I speed up the processing? > > It all depends on the processing. > You could try profiling your code to see where the time is spent. > > > Can I do multi-processing? > > Of course. But there is no guarantee that will speed things > up if there is a bottleneck on a single resource somewhere. > But it might be possible to divide and conquer and get better > speed. It all depends on what you are doing. We can't tell. > > We cannot answer such a vague question with any specific > solution. > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > -- > https://mail.python.org/mailman/listinfo/python-list Do not know these answers yet. Now, it appeared to hang/stop at a point and does not move on. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
On Sat, 19 Feb 2022 at 19:44, Mats Wichmann wrote: > On 2/19/22 05:09, Shaozhong SHI wrote: > > Can it be divided into several processes? > > Regards, > > David > > The answer is: "maybe". Multiprocessing doesn't happen for free, you > have to figure out how to divide the task up, requiring thought and > effort. We can't guess to what extent the problem you have is amenable > to multiprocessing. > > Google for "dataframe" and "multiprocessing" and you should get some > hits (in my somewhat limited experience in this area, people usually > load the csv data into Pandas before they get started working with it). > > > -- > https://mail.python.org/mailman/listinfo/python-list I am trying this approach, import multiprocessing as mp def my_func(x): print(x**x) def main(): pool = mp.Pool(mp.cpu_count()) result = pool.map(my_func, [4,2,3]) if __name__ == "__main__": main() I modified the script and set off a test run. However, I have no idea whether this approach will be faster than conventional approach. Any one has idea? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Long running process - how to speed up?
Can it be divided into several processes? Regards, David On Saturday, 19 February 2022, Chris Angelico wrote: > On Sat, 19 Feb 2022 at 22:59, Karsten Hilbert > wrote: > > > > > > I have a cvs file of 932956 row and have to have time.sleep in a > Python > > > > script. It takes a long time to process. > > > > > > > > How can I speed up the processing? Can I do multi-processing? > > > > > > > Remove the time.sleep()? > > > > He's attesting to only having "time.sleep" in there... > > > > I doubt removing that will help much ;-) > > I honestly don't understand the question, hence offering the > stupidly-obvious suggestion in the hope that it would result in a > better question. A million rows of CSV, on its own, isn't all that > much to process, so it must be the processing itself (of which we have > no information other than this reference to time.sleep) that takes all > the time. > > ChrisA > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Long running process - how to speed up?
I have a cvs file of 932956 row and have to have time.sleep in a Python script. It takes a long time to process. How can I speed up the processing? Can I do multi-processing? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
URLError:
The following is used in a loop to get response code for each url. print (urllib.request.urlopen(url).getcode()) However, error message says: URLError: Python 3.6.5 is being used to test whether url is live or not. Can anyone shed light on this? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: How to set environmental variables for Python
Set Operation System but not disturbing existing setting. Only to add at the command line. Regards, David On Mon, 17 Jan 2022 at 10:57, dn via Python-list wrote: > On 17/01/2022 22.31, Shaozhong SHI wrote: > > I got quite a few version of Python on my machine. > > > > How do I set environmental variables for Python 3.6.1 to work? > > > Set from Python, or set in the OpSys? > > https://docs.python.org/3/library/os.html?highlight=environment%20variable > > MS-Win: https://docs.python.org/3/using/windows.html#setting-envvars > -- > Regards, > =dn > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to set environmental variables for Python
I got quite a few version of Python on my machine. How do I set environmental variables for Python 3.6.1 to work? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Can Python call and use FME modules and functions such as StreamOrderCalculator?
Can we do something like import an fme.something and make use of FME modules and functions? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
ogr2ogr can not open gfs file when loading GML
My command line kept telling me that it ogr2ogr can not open gfs file. It does find it. I was trying to load GML onto PostGIS. Alternatively, how to specify XSD file to go along with reading GML files? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Installation of GeoPandas - failed at fiona
I am trying to install geopandas. I navigated to c:\programData|Anaconda3\Scripts> and typed in 'pip install geopandas'. It ran but failed at fiona. I tried import geopandas as gp, but Error Message says: No module names 'geopandas'. Can anyone help? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: System, configuration and Python performance
On Tue, 2 Nov 2021 at 00:20, Shaozhong SHI wrote: > > > On Tue, 2 Nov 2021 at 00:09, MRAB wrote: > >> On 2021-11-01 23:02, Shaozhong SHI wrote: >> > How to configure to improve Python performance in a system like the >> > following: >> > >> > Windows 10 >> > >> > System >> > >> > Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz >> > Installed memory (RAM) 32.0 GB (31.8 GB usable) >> > System type: 64-bit Operating System, x64-based processor >> > >> > I found that the Python script was runnig slowly and wanted to find out >> > what is going on and what activities it is doing. >> > >> > I opened the Task Manager and found that there is not much CPU usage. >> > >> > Do I need to do something like configuration to improve Python's >> > performance? >> > >> If CPU usage is low, then that isn't the cause of the slowness. >> >> What about disk usage? >> >> What about network usage? >> >> If it's communicating across the internet, then it might be waiting for >> the other end. If that's the case, then there's probably not much you >> can do about it. >> > > Both disk usage and network usage are very low as well. > > It is checking out responses of internet pages with given URLs. > > It is checking out whether each url is valid or not. > > > >> -- >> https://mail.python.org/mailman/listinfo/python-list >> > -- https://mail.python.org/mailman/listinfo/python-list
Re: System, configuration and Python performance
On Tue, 2 Nov 2021 at 00:09, MRAB wrote: > On 2021-11-01 23:02, Shaozhong SHI wrote: > > How to configure to improve Python performance in a system like the > > following: > > > > Windows 10 > > > > System > > > > Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz > > Installed memory (RAM) 32.0 GB (31.8 GB usable) > > System type: 64-bit Operating System, x64-based processor > > > > I found that the Python script was runnig slowly and wanted to find out > > what is going on and what activities it is doing. > > > > I opened the Task Manager and found that there is not much CPU usage. > > > > Do I need to do something like configuration to improve Python's > > performance? > > > If CPU usage is low, then that isn't the cause of the slowness. > > What about disk usage? > > What about network usage? > > If it's communicating across the internet, then it might be waiting for > the other end. If that's the case, then there's probably not much you > can do about it. > Both disk usage and network usage are very low as well. It is checking out responses of internet pages with given URLs. Regards, David > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
System, configuration and Python performance
How to configure to improve Python performance in a system like the following: Windows 10 System Processor Intel(R) Core(TM) i7-9700 CPU @3.60GHz, 3.60 GHz Installed memory (RAM) 32.0 GB (31.8 GB usable) System type: 64-bit Operating System, x64-based processor I found that the Python script was runnig slowly and wanted to find out what is going on and what activities it is doing. I opened the Task Manager and found that there is not much CPU usage. Do I need to do something like configuration to improve Python's performance? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: How to apply a self defined function in Pandas
On Sun, 31 Oct 2021 at 18:42, Shaozhong SHI wrote: > > > On Sunday, 31 October 2021, Albert-Jan Roskam > wrote: > >> >> >> > df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) >> >> >> I think you need axis=0. Or use the Series, df['URL'] = >> df.URL.apply(connect) >> > > Just experimented with your suggestion, but have not seen any difference. > Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: How to apply a self defined function in Pandas
On Sun, 31 Oct 2021 at 19:28, MRAB wrote: > On 2021-10-31 18:48, Shaozhong SHI wrote: > > > > On Sunday, 31 October 2021, MRAB wrote: > > > > On 2021-10-31 17:25, Shaozhong SHI wrote: > > > > I defined a function and apply it to a column in Pandas. But > > it does not > > return correct values. > > > > I am trying to test which url in a column full of url to see > > which one can > > be connected to or not > > > > def connect(url): > > try: > > urllib.request.urlopen(url) > > return True > > except: > > return False > > > > df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) > > > > I ran without any error, but did not return any true. > > > > I just could not find any error with it. > > > > Can anyone try and find out why > > > > You're passing a function to '.apply'. That has one argument,' x'. > > > > But what is the function doing with that argument? > > > > Nothing. > > > > The function is just returning the result of connect(df['URL']). > > > > df['URL'] is a column, so you're passing a column to '.urlopen', > > which, of course, it doesn't understand. > > > > So 'connect' returns False. > > > > > > Please expand on how. > > > It's as simple as passing 'connect' to '.apply' as the first argument. > Well, can you expand the the simplicity? Regards, David > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to apply a self defined function in Pandas
On Sunday, 31 October 2021, MRAB wrote: > On 2021-10-31 17:25, Shaozhong SHI wrote: > >> I defined a function and apply it to a column in Pandas. But it does not >> return correct values. >> >> I am trying to test which url in a column full of url to see which one can >> be connected to or not >> >> def connect(url): >> try: >> urllib.request.urlopen(url) >> return True >> except: >> return False >> >> df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) >> >> I ran without any error, but did not return any true. >> >> I just could not find any error with it. >> >> Can anyone try and find out why >> >> You're passing a function to '.apply'. That has one argument,' x'. > > But what is the function doing with that argument? > > Nothing. > > The function is just returning the result of connect(df['URL']). > > df['URL'] is a column, so you're passing a column to '.urlopen', which, of > course, it doesn't understand. > > So 'connect' returns False. > > Please expand on how. David > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: How to apply a self defined function in Pandas
On Sunday, 31 October 2021, Albert-Jan Roskam wrote: > > > > df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) > > > I think you need axis=0. Or use the Series, df['URL'] = > df.URL.apply(connect) > Any details? I will try and let you know. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
How to apply a self defined function in Pandas
I defined a function and apply it to a column in Pandas. But it does not return correct values. I am trying to test which url in a column full of url to see which one can be connected to or not def connect(url): try: urllib.request.urlopen(url) return True except: return False df['URL'] = df.apply(lambda x: connect(df['URL']), axis=1) I ran without any error, but did not return any true. I just could not find any error with it. Can anyone try and find out why Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Python script seems to stop running when handling very large dataset
On Saturday, 30 October 2021, Dieter Maurer wrote: > Shaozhong SHI wrote at 2021-10-29 23:42 +0100: > >Python script works well, but seems to stop running at a certain point > when > >handling very large dataset. > > > >Can anyone shed light on this? > > Some algorithms have non linear runtime. > > > For example, it is quite easy to write code with > quadratic runtime in Python: > s = "" > for x in ...: s += f(x) > You will see the problem only for large data sets. > Has anyone compared this with iterrow? which looping option is faster? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Python script seems to stop running when handling very large dataset
Python script works well, but seems to stop running at a certain point when handling very large dataset. Can anyone shed light on this? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
How to store the result of df.count() as a new dataframe in Pandas?
Hello, The result of df.count() appears to be a series object. How to store the result of df.count() as a new dataframe in Pandas? That is data anyhow. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
df.count() to a Pandas dataframe with column names
How to output the result of df.count() to a Pandas dataframe with column names? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Alternatives to Jupyter Notebook
Hello, Is anyone familiar with alternatives to Jupyter Notebook. My Jupyter notebook becomes unresponsive in browsers. Are there alternatives to read, edit and run Jupyter Notebook? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
SQLAlchemy fault
I read a txt file into Pandas Dataframe, and found a lot of nulls in a column. Then, I used SQLAlchemy and psycopg2. I created engine. I loaded data onto PostgreSQL. Strange thing happened. The column has no null at all. Does it mean that the data has been modified somewhere along the line? Does anyone know the robust and fast loading of Pandas frame data onto Postgres database? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Connecting to MS accdb and read data into Pandas
I tried the following code: import pyodbc conn = pyodbc.connect(r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=D:\my.accdb;') cursor = conn.cursor() cursor.execute('select * from table_name') for row in cursor.fetchall(): print (row) But I could not connect to .accdb. What is the robust way to set the connection string? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Definitive guide for Regex
Hi, Barry, In cases of automating checking, validation and producing reports in the context of data quality control and giving specific feedback to production teams, regex is perhaps the only way. Perhaps, we can give each element of data specifications a name, that are associated with a regex value, so that we can automate checking and reporting on data sets. We can report on which row of records meet specification and requirements and which one is not. And, report on which cell needs to be corrected should a row is found not meeting specification and requirements. What do you think? Regards, David On Thu, 30 Sept 2021 at 22:02, Barry Scott wrote: > > > > On 30 Sep 2021, at 19:35, dn via Python-list > wrote: > > > > On 01/10/2021 06.16, Barry Scott wrote: > >> > >> > >>> On 30 Sep 2021, at 12:29, Shaozhong SHI > wrote: > >>> > >>> Dear All, > >>> > >>> I am trying to look for a definitive guide for Regex in Python. > >>> Can anyone help? > >> > >> Have you read the python docs for the re module? > > > > > > I learned from Jeffrey Friedl's book "Mastering Regular Expressions", > > but that was in a land far away, last century, and under a different > > language (and the original version - I see it's now up to its third > > edition). > > > > Despite their concise exercise of power (and the fact that in my > > Python-life I've never been put into a corner where I absolutely must > > use one), I'm no longer a fan... > > Agreed, regex is the last tool I reach for in python code. > I find I use split() a lot to break up strings for processing. > But there are cases where a regex is the best tool for a particular job > and I then use the re module. But it costs in maintainability. > > I speak as the author of a regex engine and know how to write scary > regex's when the need arises. > > Barry > > > > -- > > Regards, > > =dn > > -- > > https://mail.python.org/mailman/listinfo/python-list > > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Definitive guide for Regex
Dear All, I am trying to look for a definitive guide for Regex in Python. Can anyone help? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Automated data testing, checking, validation, reporting for data assurance
There appear to be a few options for this. Has anyone tested and got experience with automated data testing, validation and reporting? Can anyone enlighten me? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Observing long running processes of Jupyter Notebook
We have been running Jupyter Notebook processes, which take long time to run. We use nbconvert to run these in commandline. Nbconvert only writes output into a file at the end. We just wonder whether there is a way to observe the progress and printing messages when nbconvert is running. Regards, David -- https://mail.python.org/mailman/listinfo/python-list
How to run Jupyter notebook in command line and get full error message?
How to run Jupyter notebook in command line and get full error messages? My VPN keeps dropping and can not run Jupyter Notebook as it is. I started to use nbconvert in command line. But, when it stops due to error, I can not see where the error occurs. In order to make life easier for debugging, what is the best practice? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
How to record full error message for debugging with nbconvert?
Hi, When I use nbconvert to run Jupyter notebook, it is so difficult to see the full error message for debugging? How to save full error messages? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
ssl connection has been closed unexpectedly
Hi, I keep getting the following error when I use engine = create_engine(logging in details to postgres) df.to_sql('table_name', and etc.) OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly (Background on this error at: http://sqlalche.me/e/13/e3q8) OperationalError: (psycopg2.OperationalError) SSL connection has been closed unexpectedly Can anyone shed any light on this? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: Questions about XML processing?
Hi, Hernan, Did you try to parse GML? Surely, there can be very concise and smart ways to do these things. Regards, David On Fri, 6 Nov 2020 at 20:57, Hernán De Angelis wrote: > Thank you Terry, Dan and Dieter for encouraging me to post here. I have > already solved the problem albeit with a not so efficient solution. > Perhaps, it is useful to present it here anyway in case some light can > be added to this. > > My job is to parse a complicated XML (iso metadata) and pick up values > of certain fields in certain conditions. This goes for the most part > well. I am working with xml.etree.elementtree, which proved sufficient > for the most part and the rest of the project. JSON is not an option > within this project. > > The specific trouble was in this section, itself the child of a more > complicated parent: (for simplicity tags are renamed and namespaces > removed) > > > > > Something > > > Something else > > > > > value > > > > > 2020-11-06 > > > > > > > > > > > > Basically, I have to get what is in tagC/string but only if the value of > tagC/note/title/string is "value". As you see, there are several tagC, > all children of tagB, but tagC can have different meanings(!). And no, I > have no control over how these XML fields are constructed. > > In principle it is easy to make a "findall" and get strings for tagC, > using: > > elem.findall("./tagA/tagB/tagC/string") > > and then get the content and append in case there is more than one > tagC/string like: "Something, Something else". > > However, the hard thing to do here is to get those only when > tagC/note/title/string='value'. I was expecting to find a way of > specifying a certain construction in square brackets, like > [@string='value'] or [@/tagC/note/title/string='value'], as is usual in > XML and possible in xml.etree. However this proved difficult (at least > for me). So this is the "brute" solution I implemented: > > - find all children of tagA/tagB > - check if /tagA/tagB/tagC/note/title/string has "value" > - if yes find all tagA/tagB/tagC/string > > In quasi-Python: > > string = [] > element0 = elem.findall("./tagA/tagB/") > for element1 in element0: > element2 = element1.find("./tagA/tagB/tagC/note/title/string") > if element2.text == 'value' > element3 = element1.findall("./tagA/tagB/tagC/string) > for element4 in element3: > string.append(element4.text) > > > Crude, but works. As I wrote above, I was wishing that a bracketed > clause of the type [@ ...] already in the first "findall" would do a > more efficient job but alas my knowledge of xml is too rudimentary. > Perhaps something to tinker on in the coming weeks. > > Have a nice weekend! > > > > > > On 2020-11-06 20:10, Terry Reedy wrote: > > On 11/6/2020 11:17 AM, Hernán De Angelis wrote: > >> I am confronting some XML parsing challenges and would like to ask > >> some questions to more knowledgeable Python users. Apparently there > >> exists a group for such questions but that list (xml-sig) has > >> apparently not received (or archived) posts since May 2018(!). I > >> wonder if there are other list or forum for Python XML questions, or > >> if this list would be fine for that. > > > > If you don't hear otherwise, try here. Or try stackoverflow.com and > > tag questions with python and xml. > > > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Dataframe to postgresql - Saving the dataframe to memory using StringIO
I found this last option is very interesting. Saving the dataframe to memory using StringIO https://naysan.ca/2020/06/21/pandas-to-postgresql-using-psycopg2-copy_from/ But, testing shows unicode argument expected, got 'str' Any working example for getting DataFrame into a PostgreSQL table directly? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
How to write differently to remove this type hint in Python 2.7?
Is there another way to do this? def greet(name: str) -> str: return "Hello, " + name greet File "", line 1 def greet(name: str) -> str: ^ SyntaxError: invalid syntax -- https://mail.python.org/mailman/listinfo/python-list
How to expand and flatten a nested of list of dictionaries of varied lengths?
Even worse is that, in some cases, an addition called serviceRatings as a key occur with new data unexpectedly. How to produce a robust Python/Panda script to coping with all these? Regards, David u'historicRatings': [{u'overall': {u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Requires improvement'}, {u'name': u'Well-led', u'rating': u'Requires improvement'}], u'rating': u'Requires improvement'}, u'reportDate': u'2019-10-04', u'reportLinkId': u'63ff05ec-4d31-406e-83de-49a271cfdc43'}, {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Good'}, {u'name': u'Well-led', u'rating': u'Good'}, {u'name': u'Caring', u'rating': u'Good'}, {u'name': u'Responsive', u'rating': u'Good'}, {u'name': u'Effective', u'rating': u'Requires improvement'}], u'rating': u'Good'}, u'reportDate': u'2017-09-08', u'reportLinkId': u'4f20da40-89a4-4c45-a7f9-bfd52b48f286'}, {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Requires improvement'}, {u'name': u'Well-led', u'rating': u'Requires improvement'}, {u'name': u'Caring', u'rating': u'Requires improvement'}, {u'name': u'Responsive', u'rating': u'Requires improvement'}, {u'name': u'Effective', u'rating': u'Good'}], u'rating': u'Requires improvement'}, u'reportDate': u'2016-06-11', u'reportLinkId': u'0cc4226b-401e-4f0f-ba35-062cbadffa8f'}, {u'overall': {u'keyQuestionRatings': [{u'name': u'Safe', u'rating': u'Good'}, {u'name': u'Well-led', u'rating': u'Good'}, {u'name': u'Caring', u'rating': u'Good'}, {u'name': u'Responsive', u'rating': u'Requires improvement'}, {u'name': u'Effective', u'rating': u'Good'}], u'rating': u'Good'}, u'reportDate': u'2015-01-12', u'reportLinkId': u'a11c1e52-ddfd-4cd8-8b56-1b96ac287c96'}] -- https://mail.python.org/mailman/listinfo/python-list
Are there Python ways to execute queries on PostgreSQL without getting data over?
Are there Python ways to execute queries on PostgreSQL without getting data over? Are there ways just to fire off PostgreSQL queries and not get data into Python? Regards, David -- https://mail.python.org/mailman/listinfo/python-list
Re: ValueError: arrays must all be same length
Hi, I managed to flatten it with json_normalize first. from pandas.io.json import json_normalize atable = json_normalize(d) atable Then, I got this table. brandId brandName careHome constituency currentRatings.overall.keyQuestionRatings currentRatings.overall.rating currentRatings.overall.reportDate currentRatings.overall.reportLinkId currentRatings.reportDate dormancy ... providerId region registrationDate registrationStatus regulatedActivities relationships reports specialisms type uprn 0 BD510 BRAND MACC Care Y Birmingham, Northfield [{u'reportDate': u'2020-10-01', u'rating': u'R... Requires improvement 2020-10-01 1157c975-c2f1-423e-a2b4-66901779e014 2020-10-01 N ... 1-101641521 West Midlands 2013-12-16 Registered [{u'code': u'RA2', u'name': u'Accommodation Then, I tried to expand the column of currentRatings.overall.keyQuestionRatings, with mydf = pd.DataFrame.from_dict(atable['currentRatings.overall.keyQuestionRatings'][0]) mydf Then, I got another table. name rating reportDate reportLinkId 0 Safe Requires improvement 2020-10-01 1157c975-c2f1-423e-a2b4-66901779e014 1 Well-led Requires improvement 2020-10-01 1157c975-c2f1-423e-a2b4-66901779e014 2 Caring Good 2019-10-04 63ff05ec-4d31-406e-83de-49a271cfdc43 3 Responsive Good 2019-10-04 63ff05ec-4d31-406e-83de-49a271cfdc43 4 Effective Requires improvement 2019-10-04 63ff05ec-4d31-406e-83de-49a271cfdc43 How can I re-arrange to get a flatten table? Apparently, the nested data is another table. Regards, Shao On Sun, 4 Oct 2020 at 13:55, Tim Williams wrote: > On Sun, Oct 4, 2020 at 8:39 AM Tim Williams wrote: > > > > > > > On Fri, Oct 2, 2020 at 11:00 AM Shaozhong SHI > > wrote: > > > >> Hello, > >> > >> I got a json response from an API and tried to use pandas to put data > into > >> a dataframe. > >> > >> However, I kept getting this ValueError: arrays must all be same length. > >> > >> Can anyone help? > >> > >> The following is the json text. Regards, Shao > >> > >> (snip json_text) > > > > > >> import pandas as pd > >> > >> import json > >> > >> j = json.JSONDecoder().decode(req.text) ###req.json > >> > >> df = pd.DataFrame.from_dict(j) > >> > > > > I copied json_text into a Jupyter notebook and got the same error trying > > to convert this into a pandas DataFrame:When I tried to copy this into a > > string, I got an error,, but without enclosing the paste in quotes, I got > > the dictionary. > > > > > (delete long response output) > > > > for k in json_text.keys(): > > if isinstance(json_text[k], list): > > print(k, len(json_text[k])) > > > > relationships 0 > > locationTypes 0 > > regulatedActivities 2 > > gacServiceTypes 1 > > inspectionCategories 1 > > specialisms 4 > > inspectionAreas 0 > > historicRatings 4 > > reports 5 > > > > HTH,. > > > > > This may also be more of a pandas issue. > > json.loads(json.dumps(json_text)) > > has a successful round-trip > > > > -- > >> https://mail.python.org/mailman/listinfo/python-list > >> > > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
How to handle a dictionary value that is a list
Hi, All, I was trying to handle the value of "personRoles" in a part of json dictionary. Can anyone tell me various ways to handle this? Regards, Shao "regulatedActivities": [ { "name": "Accommodation for persons who require nursing or personal care", "code": "RA2", "contacts": [ { "personTitle": "Mr", "personGivenName": "Steven", "personFamilyName": "Great", "personRoles": [ "Registered Manager" ] e,f = [],[] for result in d['regulatedActivities']: e.append(result['name']) for s in result['contacts']['personRoles']: t = (list) print s ###f.append(s) f = d['regulatedActivities']['contacts']['personRoles'] df1 = pd.DataFrame([e,f]).T -- https://mail.python.org/mailman/listinfo/python-list
ValueError: arrays must all be same length
Hello, I got a json response from an API and tried to use pandas to put data into a dataframe. However, I kept getting this ValueError: arrays must all be same length. Can anyone help? The following is the json text. Regards, Shao { "locationId": "1-1004508435", "providerId": "1-101641521", "organisationType": "Location", "type": "Social Care Org", "name": "Meadow Rose Nursing Home", "brandId": "BD510", "brandName": "BRAND MACC Care", "onspdCcgCode": "E38000220", "onspdCcgName": "NHS Birmingham and Solihull CCG", "odsCode": "VM4G9", "uprn": "100070537642", "registrationStatus": "Registered", "registrationDate": "2013-12-16", "dormancy": "N", "numberOfBeds": 56, "postalAddressLine1": "96 The Roundabout", "postalAddressTownCity": "Birmingham", "postalAddressCounty": "West Midlands", "region": "West Midlands", "postalCode": "B31 2TX", "onspdLatitude": 52.399843, "onspdLongitude": -1.989241, "careHome": "Y", "inspectionDirectorate": "Adult social care", "mainPhoneNumber": "01214769808", "constituency": "Birmingham, Northfield", "localAuthority": "Birmingham", "lastInspection": { "date": "2020-06-24" }, "lastReport": { "publicationDate": "2020-10-01" }, "relationships": [ ], "locationTypes": [ ], "regulatedActivities": [ { "name": "Accommodation for persons who require nursing or personal care", "code": "RA2", "contacts": [ { "personTitle": "Mr", "personGivenName": "Steven", "personFamilyName": "Kazembe", "personRoles": [ "Registered Manager" ] } ] }, { "name": "Treatment of disease, disorder or injury", "code": "RA5", "contacts": [ { "personTitle": "Mr", "personGivenName": "Steven", "personFamilyName": "Kazembe", "personRoles": [ "Registered Manager" ] } ] } ], "gacServiceTypes": [ { "name": "Nursing homes", "description": "Care home service with nursing" } ], "inspectionCategories": [ { "code": "S1", "primary": "true", "name": "Residential social care" } ], "specialisms": [ { "name": "Caring for adults over 65 yrs" }, { "name": "Caring for adults under 65 yrs" }, { "name": "Dementia" }, { "name": "Physical disabilities" } ], "inspectionAreas": [ ], "currentRatings": { "overall": { "rating": "Requires improvement", "reportDate": "2020-10-01", "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014", "useOfResources": { }, "keyQuestionRatings": [ { "name": "Safe", "rating": "Requires improvement", "reportDate": "2020-10-01", "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014" }, { "name": "Well-led", "rating": "Requires improvement", "reportDate": "2020-10-01", "reportLinkId": "1157c975-c2f1-423e-a2b4-66901779e014" }, { "name": "Caring", "rating": "Good", "reportDate": "2019-10-04", "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43" }, { "name": "Responsive", "rating": "Good", "reportDate": "2019-10-04", "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43" }, { "name": "Effective", "rating": "Requires improvement", "reportDate": "2019-10-04", "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43" } ] }, "reportDate": "2020-10-01" }, "historicRatings": [ { "reportLinkId": "63ff05ec-4d31-406e-83de-49a271cfdc43", "reportDate": "2019-10-04", "overall": { "rating": "Requires improvement", "keyQuestionRatings": [ { "name": "Safe", "rating": "Requires improvement" }, { "name": "Well-led", "rating": "Requires improvement" } ] } }, { "reportLinkId": "4f20da40-89a4-4c45-a7f9-bfd52b48f286", "reportDate": "2017-09-08", "overall": { "rating": "Good", "keyQuestionRatings": [ { "name": "Safe", "rating": "Good" }, { "name": "Well-led", "rating": "Good" }, { "name": "Caring", "rating": "Good" }, { "name": "Responsive", "rating": "Good" }, { "name": "Effective", "rating": "Requires improvement" } ] } }, { "reportLinkId": "0cc4226b-401e-4f0f-ba35-062cbadffa8f", "reportDate": "2016-06-11", "overall": { "rating": "Requires