I have to wonder if when something looks like HOMEWORK, if it should be answered in detail, let alone using methods beyond what is expected in class. The goal of this particular project seems to be to find one (or perhaps more) columns in some data structure like a dataframe that match two conditions (containing a copy of two numbers in one or more places) and then KNOW what column it was in. The reason I say that is because the next fairly nonsensical request is to then explicitly return what that column has in the row called 2, meaning the third row. Perhaps stated another way: "what it the item in row/address 2 of the column that somewhere contains two additional specified contents called key1 and key2" My guess is that if the instructor wanted this to be solved using methods being taught, then loops may well be a way to go. Python and numpy/pandas make it often easier to do things with columns rather than in rows across them, albeit many things allow you to specify an axis. So, yes, transposing is a way to go that transforms the problem in a way easier to solve without thinking deeply. Some other languages allow relatively easy access in both directions of horizontally versus vertically. And this may be an example where solving it as a list of lists may also be easier. Is the solution at the bottom a solution? Before I check, I want to see if I understand the required functionality and ask if it is completely and unambiguously specified. For completeness, the question being asked may need to deal with a uniqueness issue. Is it possible multiple columns match the request and thus more than one answer is required to be returned? Is the row called 2 allowed to participate in the match or must it be excluded and the question becomes to find one (or more) columns that contain key1 somewhere else than row 2 and key2 (which may have to be different than key1 or not) somewhere else and THEN provide the corresponding entry from row 2 and that (or those) column(s)? So in looking at the solution offered, what exactly was this supposed to do when dft is the transpose? idt = (dft[0] == 1) & (dft[1] == 5) Was the code (way below in this message) tried out or just written for us to ponder? I tried it. I got an answer of: 0 1 2 V2 1 5 6 That is not my understanding of what was requested. Row 2 (shown transposed as a column) is being shown as a whole. The request was for item "2" which would be just 6. Something more like this: print(dft[idt][2])
But the code makes no sense to me. seems to explicitly test the first column (0) to see if it contains a 1 and then the second column (1) to see if it contains a 5. Not sure who cares about this hard-wired query as this is not my understanding of the question. You want any of the original three rows (now transposed) tested to see if it contains BOTH. I may have read the requirements wrong or it may not be explained well. Until I am sure what is being asked and whether there is a good reason someone wants a different solution, I see no reason to provide yet another solution.But just for fund, assuming dft contains the transpose of the original data, will this work? first = dft[dft.values == key1 ]second = first[first.values == key2 ]print(second[2]) I get a 6 as an answer and suppose it could be done in one more complex expression if needed! LOL! -----Original Message----- From: Edmondo Giovannozzi <edmondo.giovanno...@gmail.com> To: python-list@python.org Sent: Sat, Jan 8, 2022 8:00 am Subject: Re: Extracting dataframe column with multiple conditions on row values Il giorno sabato 8 gennaio 2022 alle 02:21:40 UTC+1 dn ha scritto: > Salaam Mahmood, > On 08/01/2022 12.07, Mahmood Naderan via Python-list wrote: > > I have a csv file like this > > V0,V1,V2,V3 > > 4,1,1,1 > > 6,4,5,2 > > 2,3,6,7 > > > > And I want to search two rows for a match and find the column. For > > example, I want to search row[0] for 1 and row[1] for 5. The corresponding > > column is V2 (which is the third column). Then I want to return the value > > at row[2] and the found column. The result should be 6 then. > Not quite: isn't the "found column" also required? > > I can manually extract the specified rows (with index 0 and 1 which are > > fixed) and manually iterate over them like arrays to find a match. Then I > Perhaps this idea has been influenced by a similar solution in another > programming language. May I suggest that the better-answer you seek lies > in using Python idioms (as well as Python's tools)... > > key1 = 1 > > key2 = 5 > Fine, so far - excepting that this 'problem' is likely to be a small > part of some larger system. Accordingly, consider writing it as a > function. In which case, these two "keys" will become > function-parameters (and the two 'results' become return-values). > > row1 = df.iloc[0] # row=[4,1,1,1] > > row2 = df.iloc[1] # row=[6,4,5,2] > This is likely not native-Python. Let's create lists for 'everything', > just-because: > > >>> headings = [ "V0","V1","V2","V3" ] > >>> row1 = [4,1,1,1] > >>> row2 = [6,4,5,2] > >>> results = [ 2,3,6,7 ] > > > Note how I'm using the Python REPL (in a "terminal", type "python" (as > appropriate to your OpSys) at the command-line). IMHO the REPL is a > grossly under-rated tool, and is a very good means towards > trial-and-error, and learning by example. Highly recommended! > > > > for i in range(len(row1)): > > This construction is very much a "code smell" for thinking that it is > not "pythonic". (and perhaps the motivation for this post) > > In Python (compared with many other languages) the "for" loop should > actually be pronounced "for-each". In other words when we pair the > code-construct with a list (for example): > > for each item in the list the computer should perform some suite of > commands. > > (the "suite" is everything 'inside' the for-each-loop - NB my > 'Python-betters' will quickly point-out that this feature is not limited > to Python-lists, but will work with any :iterable" - ref: > https://docs.python.org/3/tutorial/controlflow.html#for-statements) > > > Thus: > > > for item in headings: print( item ) > ... > V0 > V1 > V2 > V3 > > > The problem is that when working with matrices/matrixes, a math > background equips one with the idea of indices/indexes, eg the > ubiquitous subscript-i. Accordingly, when reading 'math' where a formula > uses the upper-case Greek "sigma" character, remember that it means "for > all" or "for each"! > > So, if Python doesn't use indexing or "pointers", how do we deal with > the problem? > > Unfortunately, at first glance, the pythonic approach may seem > more-complicated or even somewhat convoluted, but once the concepts > (and/or the Python idioms) are learned, it is quite manageable (and > applicable to many more applications than matrices/matrixes!)... > > if row1[i] == key1: > > for j in range(len(row2)): > > if row2[j] == key2: > > res = df.iloc[:,j] > > print(res) # 6 > > > > Is there any way to use built-in function for a more efficient code? > This is where your idea bears fruit! > > There is a Python "built-in function": zip(), which will 'join' lists. > NB do not become confused between zip() and zip archive/compressed files! > > Most of the time reference book and web-page examples show zip() being > used to zip-together two lists into a single data-construct (which is an > iterable(!)). However, zip() will actually zip-together multiple (more > than two) "iterables". As the manual says: > > «zip() returns an iterator of tuples, where the i-th tuple contains the > i-th element from each of the argument iterables.» > > Ah, so that's where the math-idea of subscript-i went! It has become > 'hidden' in Python's workings - or putting that another way: Python > looks after the subscripting for us (and given that 'out by one' errors > in pointers is a major source of coding-error in other languages, > thank-you very much Python!) > > First re-state the source-data as Python lists, (per above) - except > that I recommend the names be better-chosen to be more meaningful (to > your application)! > > > Now, (in the REPL) try using zip(): > > >>> zip( headings, row1, row2, results ) > <zip object at 0x7f655cca6bc0> > > Does that seem a very good illustration? Not really, but re-read the > quotation from the manual (above) where it says that zip returns an > iterator. If we want to see the values an iterator will produce, then > turn it into an iterable data-structure, eg: > > >>> list( zip( headings, row1, row2, results ) ) > [('V0', 4, 6, 2), ('V1', 1, 4, 3), ('V2', 1, 5, 6), ('V3', 1, 2, 7)] > > or, to see things more clearly, let me re-type it as: > > [ > ('V0', 4, 6, 2), > ('V1', 1, 4, 3), > ('V2', 1, 5, 6), > ('V3', 1, 2, 7) > ] > > > What we now see is actually a "transpose" of the original 'matrix' > presented in the post/question! > > (NB Python will perform this layout for us - read about the pprint library) > > > Another method which can also be employed (and which will illustrate the > loop required to code the eventual-solution(!)) is that Python's next() > will extract the first row of the transpose: > > >>> row = next( zip( headings, row1, row2, results ) ) > >>> row > ('V0', 4, 6, 2) > > > This is all-well-and-good, but that result is a tuple of four items > (corresponding to one column in the way the source-data was explained). > > If we need to consider the four individual data-items, that can be > improved using a Python feature called "tuple unpacking". Instead of the > above delivering a tuple which is then assigned to "row", the tuple can > be assigned to four "identifiers", eg > > >>> heading, row1_item, row2_item, result= next( zip( headings, row1, > row2, results ) ) > > (apologies about email word-wrapping - this is a single line of Python-code) > > > Which, to prove the case, could be printed: > > >>> heading, row1_item, row2_item, result > ('V0', 4, 6, 2) > > > (ref: > https://docs.python.org/3/tutorial/datastructures.html?highlight=tuple%20unpacking#tuples-and-sequences) > > > > Thus, if we repeatedly ask for the next() row from the zip-ped > transpose, eventually it will respond with the row starting 'V2' - which > is the desired-result, ie the row containing the 1, the 5, and the 6 - > and if you follow-through using the REPL, will be clearly visible. > > > Finally, 'all' that is required, is a for-each-loop which will iterate > across/down the zip object, one tuple (row of the transpose) at a time, > AND perform the "tuple-unpacking" all in one command, with an > if-statement to detect the correct row/column: > > >>> for *tuple-unpacking* in *zip() etc*: > ... if row1_item == *what?* and row2_item == *what?* > ... print( *which* and *which identifier* ) > ... > V2 6 > > Yes, three lines. It's as easy as that! > (when you know how) > > Worse: when you become more expert, you'll be able to compress all of > that down into a single-line solution - but it won't be as "readable" as > is this! > > > NB this question has a 'question-smell' of 'homework', so I'll not > complete the code for you - this is something *you* asked to learn and > the best way to learn is by 'doing' (not by 'reading'). > > However, please respond with your solution, or any further question > (with the next version of the code so-far, per this first-post - which > we appreciate!) > > Regardless, you asked 'the right question' (curiosity is the key to > learning) and in the right way/manner. Well done! > > > NBB the above code-outline does not consider the situation where the > search fails/the keys are not found! > > > For further information, please review: > https://docs.python.org/3/library/functions.html?highlight=zip#zip > > Also, further to the above discussion of combining lists and loops: > https://docs.python.org/3/tutorial/datastructures.html?highlight=zip#looping-techniques > > > and with a similar application (to this post): > https://docs.python.org/3/faq/programming.html?highlight=zip#how-can-i-sort-one-list-by-values-from-another-list > > > -- > Regards, You may also transpose your dataset. Then the index will become your column name and the column name become your index: To read your dataset: import pandas as pd import io DN = """ V0,V1,V2,V3 4,1,1,1 6,4,5,2 2,3,6,7 """ df = pd.read_csv(io.StringIO(DN)) Transpose it: dft = df.T Find all the index with your condition: idt = (dft[0] == 1) & (dft[1] == 5) Print the columns that satisfy your condition: print(dft[idt]) As you see, without explicit loop. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list