Il giorno sabato 8 gennaio 2022 alle 23:01:13 UTC+1 Avi Gross ha scritto: > I have to wonder if when something looks like HOMEWORK, if it should be > answered in detail, let alone using methods beyond what is expected in class. > The goal of this particular project seems to be to find one (or perhaps more) > columns in some data structure like a dataframe that match two conditions > (containing a copy of two numbers in one or more places) and then KNOW what > column it was in. The reason I say that is because the next fairly > nonsensical request is to then explicitly return what that column has in the > row called 2, meaning the third row. > Perhaps stated another way: "what it the item in row/address 2 of the column > that somewhere contains two additional specified contents called key1 and > key2" > My guess is that if the instructor wanted this to be solved using methods > being taught, then loops may well be a way to go. Python and numpy/pandas > make it often easier to do things with columns rather than in rows across > them, albeit many things allow you to specify an axis. So, yes, transposing > is a way to go that transforms the problem in a way easier to solve without > thinking deeply. Some other languages allow relatively easy access in both > directions of horizontally versus vertically. And this may be an example > where solving it as a list of lists may also be easier. > Is the solution at the bottom a solution? Before I check, I want to see if I > understand the required functionality and ask if it is completely and > unambiguously specified. > For completeness, the question being asked may need to deal with a uniqueness > issue. Is it possible multiple columns match the request and thus more than > one answer is required to be returned? Is the row called 2 allowed to > participate in the match or must it be excluded and the question becomes to > find one (or more) columns that contain key1 somewhere else than row 2 and > key2 (which may have to be different than key1 or not) somewhere else and > THEN provide the corresponding entry from row 2 and that (or those) > column(s)? > So in looking at the solution offered, what exactly was this supposed to do > when dft is the transpose? > idt = (dft[0] == 1) & (dft[1] == 5) > Was the code (way below in this message) tried out or just written for us to > ponder? I tried it. I got an answer of: 0 1 2 > V2 1 5 6 > That is not my understanding of what was requested. Row 2 (shown transposed > as a column) is being shown as a whole. The request was for item "2" which > would be just 6. Something more like this: > print(dft[idt][2]) > > But the code makes no sense to me. seems to explicitly test the first column > (0) to see if it contains a 1 and then the second column (1) to see if it > contains a 5. Not sure who cares about this hard-wired query as this is not > my understanding of the question. You want any of the original three rows > (now transposed) tested to see if it contains BOTH. > I may have read the requirements wrong or it may not be explained well. Until > I am sure what is being asked and whether there is a good reason someone > wants a different solution, I see no reason to provide yet another > solution.But just for fund, assuming dft contains the transpose of the > original data, will this work? > first = dft[dft.values == key1 ]second = first[first.values == key2 > ]print(second[2]) > I get a 6 as an answer and suppose it could be done in one more complex > expression if needed! LOL! > -----Original Message----- > From: Edmondo Giovannozzi <edmondo.g...@gmail.com> > To: pytho...@python.org > Sent: Sat, Jan 8, 2022 8:00 am > Subject: Re: Extracting dataframe column with multiple conditions on row > values > > Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto: > > Salaam Mahmood, > > On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote: > > > I have a csv file like this > > > V0,V1,V2,V3 > > > 4,1,1,1 > > > 6,4,5,2 > > > 2,3,6,7 > > > > > > And I want to search two rows for a match and find the column. For > > > example, I want to search row[0] for 1 and row[1] for 5. The > > > corresponding > > > column is V2 (which is the third column). Then I want to return the value > > > at row[2] and the found column. The result should be 6 then. > > Not quite: isn't the "found column" also required? > > > I can manually extract the specified rows (with index 0 and 1 which are > > > fixed) and manually iterate over them like arrays to find a match. Then I > > Perhaps this idea has been influenced by a similar solution in another > > programming language. May I suggest that the better-answer you seek lies > > in using Python idioms (as well as Python's tools)... > > > key1 = 1 > > > key2 = 5 > > Fine, so far - excepting that this 'problem' is likely to be a small > > part of some larger system. Accordingly, consider writing it as a > > function. In which case, these two "keys" will become > > function-parameters (and the two 'results' become return-values). > > > row1 = df.iloc[0] # row=[4,1,1,1] > > > row2 = df.iloc[1] # row=[6,4,5,2] > > This is likely not native-Python. Let's create lists for 'everything', > > just-because: > > > > >>> headings = [ "V0","V1","V2","V3" ] > > >>> row1 = [4,1,1,1] > > >>> row2 = [6,4,5,2] > > >>> results = [ 2,3,6,7 ] > > > > > > Note how I'm using the Python REPL (in a "terminal", type "python" (as > > appropriate to your OpSys) at the command-line). IMHO the REPL is a > > grossly under-rated tool, and is a very good means towards > > trial-and-error, and learning by example. Highly recommended! > > > > > > > for i in range(len(row1)): > > > > This construction is very much a "code smell" for thinking that it is > > not "pythonic". (and perhaps the motivation for this post) > > > > In Python (compared with many other languages) the "for" loop should > > actually be pronounced "for-each". In other words when we pair the > > code-construct with a list (for example): > > > > for each item in the list the computer should perform some suite of > > commands. > > > > (the "suite" is everything 'inside' the for-each-loop - NB my > > 'Python-betters' will quickly point-out that this feature is not limited > > to Python-lists, but will work with any :iterable" - ref: > > https://docs.python.org/3/tutorial/controlflow.html#for-statements) > > > > > > Thus: > > > > > for item in headings: print( item ) > > ... > > V0 > > V1 > > V2 > > V3 > > > > > > The problem is that when working with matrices/matrixes, a math > > background equips one with the idea of indices/indexes, eg the > > ubiquitous subscript-i. Accordingly, when reading 'math' where a formula > > uses the upper-case Greek "sigma" character, remember that it means "for > > all" or "for each"! > > > > So, if Python doesn't use indexing or "pointers", how do we deal with > > the problem? > > > > Unfortunately, at first glance, the pythonic approach may seem > > more-complicated or even somewhat convoluted, but once the concepts > > (and/or the Python idioms) are learned, it is quite manageable (and > > applicable to many more applications than matrices/matrixes!)... > > > if row1[i] == key1: > > > for j in range(len(row2)): > > > if row2[j] == key2: > > > res = df.iloc[:,j] > > > print(res) # 6 > > > > > > Is there any way to use built-in function for a more efficient code? > > This is where your idea bears fruit! > > > > There is a Python "built-in function": zip(), which will 'join' lists. > > NB do not become confused between zip() and zip archive/compressed files! > > > > Most of the time reference book and web-page examples show zip() being > > used to zip-together two lists into a single data-construct (which is an > > iterable(!)). However, zip() will actually zip-together multiple (more > > than two) "iterables". As the manual says: > > > > «zip() returns an iterator of tuples, where the i-th tuple contains the > > i-th element from each of the argument iterables.» > > > > Ah, so that's where the math-idea of subscript-i went! It has become > > 'hidden' in Python's workings - or putting that another way: Python > > looks after the subscripting for us (and given that 'out by one' errors > > in pointers is a major source of coding-error in other languages, > > thank-you very much Python!) > > > > First re-state the source-data as Python lists, (per above) - except > > that I recommend the names be better-chosen to be more meaningful (to > > your application)! > > > > > > Now, (in the REPL) try using zip(): > > > > >>> zip( headings, row1, row2, results ) > > <zip object at 0x7f655cca6bc0> > > > > Does that seem a very good illustration? Not really, but re-read the > > quotation from the manual (above) where it says that zip returns an > > iterator. If we want to see the values an iterator will produce, then > > turn it into an iterable data-structure, eg: > > > > >>> list( zip( headings, row1, row2, results ) ) > > [('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)] > > > > or, to see things more clearly, let me re-type it as: > > > > [ > > ('V0', 4, 6, 2), > > ('V1', 1, 4, 3), > > ('V2', 1, 5, 6), > > ('V3', 1, 2, 7) > > ] > > > > > > What we now see is actually a "transpose" of the original 'matrix' > > presented in the post/question! > > > > (NB Python will perform this layout for us - read about the pprint library) > > > > > > Another method which can also be employed (and which will illustrate the > > loop required to code the eventual-solution(!)) is that Python's next() > > will extract the first row of the transpose: > > > > >>> row = next( zip( headings, row1, row2, results ) ) > > >>> row > > ('V0', 4, 6, 2) > > > > > > This is all-well-and-good, but that result is a tuple of four items > > (corresponding to one column in the way the source-data was explained). > > > > If we need to consider the four individual data-items, that can be > > improved using a Python feature called "tuple unpacking". Instead of the > > above delivering a tuple which is then assigned to "row", the tuple can > > be assigned to four "identifiers", eg > > > > >>> heading, row1_item, row2_item, result= next( zip( headings, row1, > > row2, results ) ) > > > > (apologies about email word-wrapping - this is a single line of > > Python-code) > > > > > > Which, to prove the case, could be printed: > > > > >>> heading, row1_item, row2_item, result > > ('V0', 4, 6, 2) > > > > > > (ref: > > https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences) > > > > > > > > Thus, if we repeatedly ask for the next() row from the zip-ped > > transpose, eventually it will respond with the row starting 'V2' - which > > is the desired-result, ie the row containing the 1, the 5, and the 6 - > > and if you follow-through using the REPL, will be clearly visible. > > > > > > Finally, 'all' that is required, is a for-each-loop which will iterate > > across/down the zip object, one tuple (row of the transpose) at a time, > > AND perform the "tuple-unpacking" all in one command, with an > > if-statement to detect the correct row/column: > > > > >>> for *tuple-unpacking* in *zip() etc*: > > ... if row1_item == *what?* and row2_item == *what?* > > ... print( *which* and *which identifier* ) > > ... > > V2 6 > > > > Yes, three lines. It's as easy as that! > > (when you know how) > > > > Worse: when you become more expert, you'll be able to compress all of > > that down into a single-line solution - but it won't be as "readable" as > > is this! > > > > > > NB this question has a 'question-smell' of 'homework', so I'll not > > complete the code for you - this is something *you* asked to learn and > > the best way to learn is by 'doing' (not by 'reading'). > > > > However, please respond with your solution, or any further question > > (with the next version of the code so-far, per this first-post - which > > we appreciate!) > > > > Regardless, you asked 'the right question' (curiosity is the key to > > learning) and in the right way/manner. Well done! > > > > > > NBB the above code-outline does not consider the situation where the > > search fails/the keys are not found! > > > > > > For further information, please review: > > https://docs.python.org/3/library/functions.html?highlight=zip#zip > > > > Also, further to the above discussion of combining lists and loops: > > https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques > > > > > > and with a similar application (to this post): > > https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list > > > > > > -- > > Regards, > > You may also transpose your dataset. Then the index will become your column > name and the column name become your index: > To read your dataset: > > import pandas as pd > import io > > DN = """ > V0,V1,V2,V3 > 4,1,1,1 > 6,4,5,2 > 2,3,6,7 > """ > df = pd.read_csv(io.StringIO(DN)) > > Transpose it: > > dft = df.T > > Find all the index with your condition: > > idt = (dft[0] == 1) & (dft[1] == 5) > > Print the columns that satisfy your condition: > > print(dft[idt]) > > As you see, without explicit loop. > -- > https://mail.python.org/mailman/listinfo/python-list
I was just showing that one can transpose the dataframe and use logical indexing, as the OP was asking for a fast and efficient solution. It is just a hint and not a complete solution. And, of course, one have to put everything inside a function. And yes, I tested what I have written. There could be other problems, of course, if some columns are not numeric, but all this details depend on the problem at hands. But, I may not have understood completely the problem presented by the OP. Cheers, :-) -- https://mail.python.org/mailman/listinfo/python-list