Re: [Tutor] adding numpy to pandas

2018-06-21 Thread Peter Otten
Mats Wichmann wrote:

> On 06/20/2018 02:04 PM, Glenn Schultz wrote:
>> All,
>> 
>> I have a pandas dataframe and a predict result (numpy array) of a
>> classifier [[0,1],[1,0]].  What I would like to do is as the positive to
>> the pandas dataframe.  I use predict[:,1] to slice the postive from
>> numpy which gives me a row of the result.  but I cannot concat to the
>> pandas df['result'] = predict[:,1] does not work and I have tried
>> various ways to do this with no result.  I am missing something here.
> 
> You should take a look here:
> 
> https://pandas.pydata.org/community.html
> 
> History has indicated that the Python tutor group isn't overloaded with
> Pandas experts. You may still get an answer here, but that page suggests
> the preferred places from the community to interact with to get good
> answers.  There's also a Google Groups which doesn't seem to be
> mentioned on the page:
> 
> https://groups.google.com/forum/#!forum/pydata

Regardless of the chosen forum, try to be as precise as possible with your 
problem description. It really can't get any worse than "does not work".

I tried but failed to reproduce your problem from what little information 
you provide:

>>> a = np.array([[0,1],[1,0]])
>>> df = pd.DataFrame([[1,2], [3,4]], columns=["a", "b"])
>>> df["result"] = a[:,1]
>>> df
   a  b  result
0  1  2   1
1  3  4   0

Please take the time to read http://sscce.org/ to learn how you can improve 
your question. You'll be rewarded with better answers from us or by the real 
experts elsewhere.

Thank you.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] adding numpy to pandas

2018-06-21 Thread Mats Wichmann
On 06/20/2018 02:04 PM, Glenn Schultz wrote:
> All,
> 
> I have a pandas dataframe and a predict result (numpy array) of a
> classifier [[0,1],[1,0]].  What I would like to do is as the positive to
> the pandas dataframe.  I use predict[:,1] to slice the postive from
> numpy which gives me a row of the result.  but I cannot concat to the
> pandas df['result'] = predict[:,1] does not work and I have tried
> various ways to do this with no result.  I am missing something here.

You should take a look here:

https://pandas.pydata.org/community.html

History has indicated that the Python tutor group isn't overloaded with
Pandas experts. You may still get an answer here, but that page suggests
the preferred places from the community to interact with to get good
answers.  There's also a Google Groups which doesn't seem to be
mentioned on the page:

https://groups.google.com/forum/#!forum/pydata

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing and collecting keywords from a webpage

2018-06-21 Thread Peter Otten
Daniel Bosah wrote:

> new_list = [x.encode('latin-1') for x in sorted(paul)]

I don't see why you would need bytes

>   search = "(" + b"|".join(new_list).decode() + ")" + "" #re.complie needs

when your next step is to decode it. I'm not sure why it even works as the 
default encoding is usually UTF-8.

> u'José Antonio (Pepillo) Salcedo'

Those parens combined with
> 
>   search = "(" + b"|".join(new_list).decode() + ")" + "" #re.complie needs
> string as first argument, so adds string to be first argument, and joins
> the strings together with john
> 
>  # print (type(search))
>   pattern = re.compile(search)#compiles search to be a regex object
>   reg = pattern.findall(str(soup))#calls findall on pattern, which findall

will cause findall() to return a list of 2-tuples:

>>> re.compile("(" + "|".join(["foo", "bar(baz)"]) + ")").findall("yadda foo 
yadda bar(baz)")
[('foo', '')]

Applying re.escape() can prevent that:

>>> re.compile("(" + "|".join(re.escape(s) for s in ["foo", "bar(baz)"]) + 
")").findall("yadda foo yadda bar(baz)")
['foo', 'bar(baz)']


>  if i in reg and paul: # this loop checks to see if elements are in
> both the regexed parsed list and the list. If i is in both, it is added to
> list.

No it doesn't, it is equivalent to

if (i in reg) and bool(paul):
...

or, since paul is a list

if (i in reg) and len(paul) > 0:
...

for non-empty lists effectively

if i in reg:
...
> sets.append(str(i))
> with open('sets.txt', 'w') as f:
> f.write(str(sets))

This writes a single line of the form

['first', 'second item', ...]

> f.close()

No need  to close() the file explicitly -- with open() already implies that 
and operates more reliably (the file will be closed even if an exception is 
raised in th e with-suite).

> def regexparse(regex):
> monum = [u'road', u'blvd',u'street', u'town', u'city',u'Bernardo
> Vega'] setss = []
> 
> f = open('sets.txt', 'rt')
> f = list(f)

From my explanation above follows that the list f contains a single string 
(and one that does not occur in monum) so that setss should always be empty.

> for i in f:
>if i in f and i in monum:
>   setss.append(i)
> #with open ('regex.txt','w') as q:
> #q.write(str(setss))
># q.close()
> print (setss)
> 
> 
> if __name__ == '__main__':
>regexparse(regex('
> 
https://en.wikipedia.org/wiki/List_of_people_from_the_Dominican_Republic'))
> 
> 
> What this code is doing is basically going through a webpage using
> BeautifulSoup and regex to compare a regexed list of words ( in regex ) to
> a list of keywords and then writing them to a textfile. The next function
> (regexparse) then goes and has a empty list (setss), then reads the
> textfile from the previous function.  What I want to do, in a for loop, is
> check to see if words in monum and the textfile ( from the regex function
> ) are shared, and if so , those shared words get added to the empty
> list(setss) , then written to a file ( this code is going to be added to a
> web crawler, and is basically going to be adding words and phrases to a
> txtfile as it crawls through the internet. ).
> 
> However, every time I run the current code, I get all the
> textfile(sets.txt) from the previous ( regex ) function, even though all I
> want are words and pharse shared between the textfile from regex and the
> monum list from regexparse. How can I fix this?

Don't write a complete script and then cross your fingers hoping that it 
will work as expected -- that rarely happens even to people with more 
experience; they just find their errors more quickly ;). Instead start with 
the first step, add print calls generously, and only continue working on the 
next step when you are sure that the first does exactly what you want.

Once your scripts get more complex replace visual inspection via print()
with a more formal approach

https://docs.python.org/dev/library/unittest.html


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing and collecting keywords from a webpage

2018-06-21 Thread Alan Gauld via Tutor
On 20/06/18 20:32, Daniel Bosah wrote:

>   reg = pattern.findall(str(soup))
> 
>   for i in reg:
>  if i in reg and paul: # this loop checks to see if elements are in
> both the regexed parsed list and the list. 

No it doesn't. It checks if i is in reg and
if paul is non empty - which it always is.
So this if test is really just testing if
i is in reg. This is also always truie since
the for loop is iterating over reg.

So you are effectively saying

if True and True

or

if True.

What you really wanted was something like

if i in reg and i in paul:

But since you know i is in reg you can drop
that bit to get

if i in paul:



> sets.append(str(i))

Because the if is always true you always add i to sets


> with open('sets.txt', 'w') as f:
> f.write(str(sets))
> f.close()

Why not just wait to the end? Writing the entire sets stucture
to a file each time is very wasteful. Alternatively use the
append mode and just write the new item to the file.

Also you don't need f.close if you use a with statement.

> However, every time I run the current code, I get all the
> textfile(sets.txt) from the previous ( regex ) function, even though all I
> want are words and pharse shared between the textfile from regex and the
> monum list from regexparse. How can I fix this?

I think that's due to the incorrect if expression above.

But I didn't check the rest of the code...

However, I do wonder about your use of soup as your
search string. Isn't soup the parsed html structure?
Is that really what you want to search with your regex?
But I'm no BS expert, so there might be some magic
at work there.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor