Re: [Tutor] When is and isn't "__file__" set?

2018-01-10 Thread Steven D'Aprano
On Wed, Jan 10, 2018 at 08:02:24PM -0600, boB Stepp wrote:

> I am still puzzling over things from the thread, "Why does
> os.path.realpath('test_main.py') give different results for unittest
> than for testing statement in interpreter?"  The basic question I am
> trying to answer is how to determine the path to a particular module
> that is being run.  For the experiments I have run thus far, the
> module attribute, "__file__", has so far reliably given me the
> absolute path to the module being run.  But the documentation suggests
> that this attribute is optional.  So what can I rely on here with
> "__file__"?  The first sentence of the cited quote is not illuminating
> this sufficiently for me.

Modules which are loaded from a .py or .pyc file on disk should always 
have __file__ set. If they don't, that's a bug in the interpreter.

Modules which are loaded from a .dll or .so binary file also should have 
__file__ set.

Modules that you create on the fly like this:

py> from types import ModuleType
py> module = ModuleType('module')
py> module.__file__
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: module 'module' has no attribute '__file__'


will not have __file__ set unless you manually set it yourself. Such 
hand-made modules can be stored in databases and retrieved later, in 
which case they still won't have a __file__ attribute.

Module objects which are built into the interpreter itself, like sys, 
also won't have a __file__ attribute:

py> sys.__file__
Traceback (most recent call last):
  File "", line 1, in 
AttributeError: module 'sys' has no attribute '__file__'


One tricky, but unusual case, is that Python supports importing 
and running modules loaded from zip files. Very few people know this 
feature even exists -- it is one of Python's best kept secrets -- and 
even fewer know how it works. I'm not sure what happens when you load a 
module from a zip file, whether it will have a __file__ or not.

Basically, if you still to reading module.__file__ for modules which 
come from a .py file, you should be absolutely fine. But to practice 
defensive programming, something like:

try:
path = module.__file__
except AttributeError:
print('handle the case where the module doesn't exist on disk')
else:
print('handle the case where the module does exist on disk')


might be appropriate.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] When is and isn't "__file__" set?

2018-01-10 Thread boB Stepp
I am actually interested in the answer to this question for Python
versions 2.4, 2.6 and 3.x.

At https://docs.python.org/3/reference/import.html?highlight=__file__#__file__
it says:


__file__ is optional. If set, this attribute’s value must be a string.
The import system may opt to leave __file__ unset if it has no
semantic meaning (e.g. a module loaded from a database).

If __file__ is set, it may also be appropriate to set the __cached__
attribute which is the path to any compiled version of the code (e.g.
byte-compiled file). The file does not need to exist to set this
attribute; the path can simply point to where the compiled file would
exist (see PEP 3147).

It is also appropriate to set __cached__ when __file__ is not set.
However, that scenario is quite atypical. Ultimately, the loader is
what makes use of __file__ and/or __cached__. So if a loader can load
from a cached module but otherwise does not load from a file, that
atypical scenario may be appropriate.


I am still puzzling over things from the thread, "Why does
os.path.realpath('test_main.py') give different results for unittest
than for testing statement in interpreter?"  The basic question I am
trying to answer is how to determine the path to a particular module
that is being run.  For the experiments I have run thus far, the
module attribute, "__file__", has so far reliably given me the
absolute path to the module being run.  But the documentation suggests
that this attribute is optional.  So what can I rely on here with
"__file__"?  The first sentence of the cited quote is not illuminating
this sufficiently for me.


-- 
boB
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] delete strings from specificed words

2018-01-10 Thread Cameron Simpson

On 09Jan2018 22:20, YU Bo  wrote:

The text i will working as follow:

```text

[...]

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index a789f952b3e9..443892dabedb 100644

[...]

+++ b/tools/perf/util/util.c

[...]

```
In fact, this is a patch from lkml,my goal is to design a kernel podcast
for myself to focus on what happened in kernel. I have crawled the text
with python and want to remove strings from *diff --git*, because reading
the git commit above, i have a shape in head.

I have tried split(), replace(), but i have no idea to deal with it.


Do you have the text as above - a single string - or coming from a file? I'll 
presume a single string.


I would treat the text as lines, particularly since the diff markers etc are 
all line oriented.


So you might write something like this:

 interesting = []
 for line in the_text.splitlines():
   if line.startswith('diff --git '):
 break
   interesting.append(line)

Now the "interesting" list has the lines you want.

There's any number of variations on that you might use, but that should get you 
going.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why does os.path.realpath('test_main.py') give different results for unittest than for testing statement in interpreter?

2018-01-10 Thread Alan Gauld via Tutor
On 10/01/18 20:20, eryk sun wrote:

> ... And working with COM via ctypes is also complex, which is why
> comtypes exists.

Or easier still Pythonwin (aka PyWin32).
I far prefer pythonwin over ctypes for any kind of COM work.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why does os.path.realpath('test_main.py') give different results for unittest than for testing statement in interpreter?

2018-01-10 Thread eryk sun
On Wed, Jan 10, 2018 at 12:59 PM, Albert-Jan Roskam
 wrote:
>
> I tried:
 from os.path import _getfullpathname
 _getfullpathname(r"H:")
> 'h:\\path\\to\\folder'
 import os
 os.getcwd()
> 'h:\\path\\to\\folder'
>
> I expected h:\ to be \\server\share\foo.

You called _getfullpathname (WinAPI GetFullPathName), not
_getfinalpathname (WinAPI GetFinalPathNameByHandle). GetFullPathName
works on the path as a string without touching the filesystem.
GetFinalPathNameByHandle reconstructs a path when given a handle to a
file or directory.

> The fact that the current working directory was returned was even more 
> unexpected.

"H:" or "H:relative/path" is relative to the working directory on
drive H:. The process only has one working directory, but
GetFullPathName also checks for environment variables such as "=H:".
The C runtime's _chdir function sets these magic variables, as does
Python's os.chdir function (we don't call C _chdir). WinAPI
SetCurrentDirectory does not set them. For example:

>>> os.chdir('Z:/Temp')
>>> win32api.GetEnvironmentVariable('=Z:')
'Z:\\Temp'
>>> os.path._getfullpathname('Z:relative')
'Z:\\Temp\\relative'

> Why would anybody *want* the drive letters? They are only useful because (a) 
> they save on
> keystrokes (b) they bypass the annoying limitation of the cd command on 
> windows, ie. it
> does not work with UNC paths.

Windows itself has no problem using a UNC path as the working
directory. That's a limit of the CMD shell.

SUBST drives can be used to access long paths. Since the substitution
occurs in the kernel, it avoids the MAX_PATH 260-character limit. Of
course, you can also use junctions and symlinks to access long paths,
or in Windows 10 simply enable long-path support.

> I know net use, pushd, subst. I use 'net use' for more or less permanent 
> drives and
> pushd/popd to get a temporary drive, available letter (cd nuisance).

`net.exe use` and CMD's PUSHD command (with a UNC path) both call
WinAPI WNetAddConnection2 to create a mapped network drive. The
difference is that net.exe can supply alternate credentials and create
a persistent mapping, while PUSHD uses the current user's credentials
and creates a non-persistent mapping.

If your account gets logged on with a UAC split token, the standard
and elevated tokens actually have separate logon sessions with
separate local-device mappings. You can enable a policy to link the
two logon sessions. Set a DWORD value of 1 named
"EnableLinkedConnections" in the key
"HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System", and
reboot.

subst.exe creates substitute paths using WinAPI DefineDosDevice.
Unlike the WNet API, this function doesn't use MPR (multiple provider
router) to create a direct link for the network provider (e.g.
\Device\LanmanRedirectory); doesn't create a linked connection when
EnableLinkedConnections is defined; and can't create a persistent
drive with stored credentials (though you can use an account logon
script for this). On the plus side, a drive mapped via subst.exe can
target any path.

> Interesting code! I have used the following, which uses SHGetFolderPath, ie. 
> without 'Known'.
> from win32com.shell import shell, shellcon
> desktop = shell.SHGetFolderPath(0, shellcon.CSIDL_DESKTOP, 0, 0)

SHGetFolderPath is usually fine, but still, it's outdated and
deprecated. win32com.shell doesn't wrap SHGetKnownFolderPath for some
reason, but you can still use the new known-folder API without ctypes.
Just create a KnownFolderManager instance. For example:

import pythoncom
from win32com.shell import shell

kfmgr = pythoncom.CoCreateInstance(shell.CLSID_KnownFolderManager, None,
pythoncom.CLSCTX_INPROC_SERVER, shell.IID_IKnownFolderManager)

desktop_path = kfmgr.GetFolder(shell.FOLDERID_Desktop).GetPath()

This doesn't work as conveniently for getting known folders of other
users. While the high-level SHGetKnownFolderPath function takes care
of loading the user profile and impersonating, we have to do this
ourselves when using a KnownFolderManager instance.

That said, to correct my previous post, you have to be logged on with
SeTcbPrivilege access (e.g. a SYSTEM service) to get and set other
users' known folders without their password. (If you have the password
you can use a regular logon instead of an S4U logon, and that works
fine.)

> Working with ctypes.wintypes is quite complex!

I wouldn't say ctypes is complex in general. But calling LsaLogonUser
is complex due to all of the structs that include variable-sized
buffers. And working with COM via ctypes is also complex, which is why
comtypes exists.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about metaclasses

2018-01-10 Thread Peter Otten
Albert-Jan Roskam wrote:

> Why does following the line (in #3) 

> # 3-
> class Meta(type):
> def __new__(cls, name, bases, attrs):
> for attr, obj in attrs.items():
> if attr.startswith('_'):
> continue
> elif not isinstance(obj, property):
> import pdb;pdb.set_trace()
> #setattr(cls, attr, property(lambda self: obj))  #
> #incorrect!
> raise ValueError("Only properties allowed")
> return super().__new__(cls, name, bases, attrs)
> 
> class MyReadOnlyConst(metaclass=Meta):
> __metaclass__ = Meta
> YES = property(lambda self: 1)
> NO = property(lambda self: 0)
> DUNNO = property(lambda self: 42)
> THROWS_ERROR = 666
> 
> 
> c2 = MyReadOnlyConst()
> print(c2.THROWS_ERROR)
> #c2.THROWS_ERROR = 777
> #print(c2.THROWS_ERROR)

> not convert the normal attribute into
> a property?
> 
> setattr(cls, attr, property(lambda self: obj))  # incorrect!

cls is Meta itself, not MyReadOnlyConst (which is an instance of Meta).
When the code in Meta.__new__() executes MyReadOnlyConst does not yet exist,
but future attributes are already there, in the form of the attrs dict.
Thus to convert the integer value into a read-only property you can 
manipulate that dict (or the return value of super().__new__()):

class Meta(type):
def __new__(cls, name, bases, attrs):
for attr, obj in attrs.items():
if attr.startswith('_'):
continue
elif not isinstance(obj, property):
attrs[attr] = property(lambda self, obj=obj: obj)

return super().__new__(cls, name, bases, attrs)

class MyReadOnlyConst(metaclass=Meta):
YES = property(lambda self: 1)
NO = property(lambda self: 0)
DUNNO = property(lambda self: 42)
THROWS_ERROR = 666

c = MyReadOnlyConst()
try:
c.THROWS_ERROR = 42
except AttributeError:
pass
else:
assert False
assert c.THROWS_ERROR == 666

PS: If you don't remember why the obj=obj is necessary:
Python uses late binding; without that trick all lambda functions would 
return the value bound to the obj name when the for loop has completed.
A simplified example:

>>> fs = [lambda: x for x in "abc"]
>>> fs[0](), fs[1](), fs[2]()
('c', 'c', 'c')
>>> fs = [lambda x=x: x for x in "abc"]
>>> fs[0](), fs[1](), fs[2]()
('a', 'b', 'c')


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about metaclasses

2018-01-10 Thread Zachary Ware
On Wed, Jan 10, 2018 at 10:08 AM, Albert-Jan Roskam
 wrote:
> Hi,
>
>
> In another thread on this list I was reminded of types.SimpleNamespace. This 
> is nice, but I wanted to create a bag class with constants that are 
> read-only. My main question is about example #3 below (example #2 just 
> illustrates my thought process). Is this a use case to a metaclass? Or can I 
> do it some other way (maybe a class decorator?). I would like to create a 
> metaclass that converts any non-special attributes (=not starting with '_') 
> into properties, if needed. That way I can specify my bag class in a very 
> clean way: I only specify the metaclass, and I list the attributes as normal 
> attrbutes, because the metaclass will convert them into properties.

You appear to be reimplementing Enum.

> Why does following the line (in #3) not convert the normal attribute into a 
> property?
>
> setattr(cls, attr, property(lambda self: obj))  # incorrect!

Because `cls` is `Meta`, not `MyReadOnlyConst`; `__new__` is
implicitly a classmethod and `MyReadOnlyConst` doesn't actually exist
yet.  When `MyReadOnlyConst` is created by `type.__new__` it will be
filled with the contents of `attrs`, so instead of `setattr` you want
`attrs[attr] = property(...)`.

But once you're past the learning exercise that this is, just use
enum.Enum or collections.namedtuple :)

-- 
Zach
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] question about metaclasses

2018-01-10 Thread Steven D'Aprano
On Wed, Jan 10, 2018 at 04:08:04PM +, Albert-Jan Roskam wrote:

> In another thread on this list I was reminded of 
> types.SimpleNamespace. This is nice, but I wanted to create a bag 
> class with constants that are read-only.

If you expect to specify the names of the constants ahead of time, the 
best solution is (I think) a namedtuple.

from collections import namedtuple
Bag = namedtuple('Bag', 'yes no dunno')
a = Bag(yes=1, no=0, dunno=42)
b = Bag(yes='okay', no='no way', dunno='not a clue')

ought to do what you want.

Don't make the mistake of doing this:

from collections import namedtuple
a = namedtuple('Bag', 'yes no dunno')(yes=1, no=0, dunno=42)
b = namedtuple('Bag', 'yes no dunno')(yes='okay', no='no way', dunno='not a 
clue')

because that's quite wasteful of memory: each of a and b belong to a 
separate hidden class, and classes are rather largish objects.


If you expect to be able to add new items on the fly, but have them 
read-only once set, that's a different story.


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] question about metaclasses

2018-01-10 Thread Albert-Jan Roskam
Hi,


In another thread on this list I was reminded of types.SimpleNamespace. This is 
nice, but I wanted to create a bag class with constants that are read-only. My 
main question is about example #3 below (example #2 just illustrates my thought 
process). Is this a use case to a metaclass? Or can I do it some other way 
(maybe a class decorator?). I would like to create a metaclass that converts 
any non-special attributes (=not starting with '_') into properties, if needed. 
That way I can specify my bag class in a very clean way: I only specify the 
metaclass, and I list the attributes as normal attrbutes, because the metaclass 
will convert them into properties.


Why does following the line (in #3) not convert the normal attribute into a 
property?

setattr(cls, attr, property(lambda self: obj))  # incorrect!



# 1-
# nice, but I want the constants to be read-only
from types import SimpleNamespace
const = SimpleNamespace(YES=1, NO=0, DUNNO=9)
const.YES = 0
print(const)


# 2-
# works, but I wonder if there's a builtin way
class Const(object):
"""Adding attributes is ok, modifying them is not"""
YES = property(lambda self: 1)
NO = property(lambda self: 0)
DUNNO = property(lambda self: 42)
#THROWS_ERROR = 666

def __new__(cls):
for attr in dir(cls):
if attr.startswith('_'):
continue
elif not isinstance(getattr(cls, attr), property):
raise ValueError("Only properties allowed")
return super().__new__(cls)
def __repr__(self):
kv = ["%s=%s" % (attr, getattr(self, attr)) for \
  attr in sorted(dir(Const)) if not attr.startswith('_')]
return "ReadOnlyNamespace(" + ", ".join(kv) + ")"

c = Const()
print(repr(c))
#c.YES = 42  # raises AttributeError (desired behavior)

print(c.YES)


# 3-
class Meta(type):
def __new__(cls, name, bases, attrs):
for attr, obj in attrs.items():
if attr.startswith('_'):
continue
elif not isinstance(obj, property):
import pdb;pdb.set_trace()
#setattr(cls, attr, property(lambda self: obj))  # incorrect!
raise ValueError("Only properties allowed")
return super().__new__(cls, name, bases, attrs)

class MyReadOnlyConst(metaclass=Meta):
__metaclass__ = Meta
YES = property(lambda self: 1)
NO = property(lambda self: 0)
DUNNO = property(lambda self: 42)
THROWS_ERROR = 666


c2 = MyReadOnlyConst()
print(c2.THROWS_ERROR)
#c2.THROWS_ERROR = 777
#print(c2.THROWS_ERROR)


Thank you in advance and sorry about the large amount of code!


Albert-Jan

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Why does os.path.realpath('test_main.py') give different results for unittest than for testing statement in interpreter?

2018-01-10 Thread Albert-Jan Roskam

From: eryk sun 
Sent: Wednesday, January 10, 2018 3:56 AM
To: tutor@python.org
Cc: Albert-Jan Roskam
Subject: Re: [Tutor] Why does os.path.realpath('test_main.py') give different 
results for unittest than for testing statement in interpreter?
  

On Tue, Jan 9, 2018 at 2:48 PM, Albert-Jan Roskam
 wrote:
>
>> I think that it would be a great enhancement if os.realpath would return the 
>> UNC path if
>> given a mapped drive in Windows, if needed as extended path (prefixed with 
>> "\\?\UNC\").
>> That's something I use all the time, unlike symlinks, in Windows.
>
>pathlib can do this for you, or call os.path._getfinalpathname. 

I tried: 
>>> from os.path import _getfullpathname
>>> _getfullpathname(r"H:")
'h:\\path\\to\\folder'
>>> import os
>>> os.getcwd()
'h:\\path\\to\\folder'

I expected h:\ to be \\server\share\foo. The fact that the current working 
directory was returned was even more unexpected.

>>I recently helped someone that wanted the reverse, to map the resolved
>>UNC path back to a logical drive:

Oh dear. Why would anybody *want* the drive letters? They are only useful 
because (a) they save on keystrokes (b) they bypass the annoying limitation of 
the cd command on windows, ie. it does not work with UNC paths. 
Driveletter-->UNC conversion is useful when e.g. logging file paths. I do 
wonder whether the method used to assign the drive letter matters with the . I 
know net use, pushd, subst. I use 'net use' for more or less permanent drives 
and pushd/popd to get a temporary drive, available letter (cd nuisance).

>We can't assume in general that a user's special folders (e.g.
>Desktop, Documents) are in the default location relative to the
>profile directory. Almost all of them are relocatable. There are shell
>APIs to look up the current locations, such as SHGetKnownFolderPath.
>This can be called with ctypes [1]. For users other than the current
>user, it requires logging on and impersonating the user, which
>requires administrator access.
>
>[1]: https://stackoverflow.com/a/33181421/205580

Interesting code! I have used the following, which uses SHGetFolderPath, ie. 
without 'Known'.
from win32com.shell import shell, shellcon
desktop = shell.SHGetFolderPath(0, shellcon.CSIDL_DESKTOP, 0, 0)

Working with ctypes.wintypes is quite complex!


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] delete strings from specificed words

2018-01-10 Thread YU Bo

Hi,
On Wed, Jan 10, 2018 at 10:37:09AM +0100, Peter Otten wrote:

YU Bo wrote:


index 45a63e0..3b9b238 100644
...
```
I want to delete string from *diff --git* to end, because too many code is
here


Use str.split() or str.partition() and only keep the first part:


text = """The registers rax, rcx and rdx are touched when controlling

IBRS
... so they need to be saved when they can't be clobbered.
...
... diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
... index 45a63e0..3b9b238 100644
... """

cleaned_text = text.partition("diff --git")[0].strip()
print(cleaned_text)

The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.


Cool,It is what i want.

Thanks all!

Bo




___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] delete strings from specificed words

2018-01-10 Thread Peter Otten
YU Bo wrote:

> ```text
> The registers rax, rcx and rdx are touched when controlling IBRS
> so they need to be saved when they can't be clobbered.
> 
> diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
> index 45a63e0..3b9b238 100644
> ...
> ```
> I want to delete string from *diff --git* to end, because too many code is
> here

Use str.split() or str.partition() and only keep the first part: 

>>> text = """The registers rax, rcx and rdx are touched when controlling 
IBRS
... so they need to be saved when they can't be clobbered.
... 
... diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
... index 45a63e0..3b9b238 100644
... """
>>> cleaned_text = text.partition("diff --git")[0].strip()
>>> print(cleaned_text)
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] delete strings from specificed words

2018-01-10 Thread YU Bo

Hi,

First, thank you very much for your reply.

On Tue, Jan 09, 2018 at 10:25:11PM +, Alan Gauld via Tutor wrote:

On 09/01/18 14:20, YU Bo wrote:


But, i am facing an interesting question.I have no idea to deal with it.


I don;t think you have given us enough context to
be able to help much. WE would need some idea of
the input and output data (both current and desired)





It sounds like you are building some kind of pretty printer.
Maybe you could use Pythons pretty-print module as a design
template? Or maybe even use some of it directly. It just
depends on your data formats etc.


Yes. I think python can deal with it directly.




In fact, this is a patch from lkml,my goal is to design a kernel podcast
for myself to focus on what happened in kernel.


Sorry, I've no idea what lkml is nor what kernel you are talking about.

Can you show us what you are receiving, what you are
currently producing and what you are trying to produce?

Some actual code might be an idea too.
And the python version and OS.


Sorry, i don't to explain it.But, my code is terribly.

lkml.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: lkml.py
# Author: Bo Yu

""" This is source code in page that i want to get

"""
import sys
reload(sys)
sys.setdefaultencoding('utf8')

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

# import myself print function

from get_content import print_content

if __name__ == '__main__':
   comment_url = []
   target = 'https://www.spinics.net/lists/kernel/threads.html'
   req = requests.get(url=target)
   req.encoding = 'utf-8'
   content = req.text
   bf = BeautifulSoup(content ,'lxml') # There is no problem

   context = bf.find_all('strong')
   for ret in context[0:1]:
 for test in ret:
   print '\t'
x = re.split(' ', str(test))
y = re.search('"(.+?)"', str(x)).group(1)
comment_url.append(target.replace("threads.html", str(y)))

   for tmp_target in comment_url:
print "===This is a new file ==="
print_content(tmp_target, 'utf-8', 'title')

```
get_content.py:

```code
#!/usr/bin/python
# -*- coding: UTF-8 -*-
# File Name: get_content.py

import urllib2
from bs4 import BeautifulSoup
import requests

import chardet
import re

def print_content(url, charset, find_id):
   req = requests.get(url=url)
   req.encoding = charset
   content = req.text
   bf = BeautifulSoup(content ,'lxml')
   article_title = bf.find('h1')
   #author = bf.find_all('li')
   commit = bf.find('pre')
   print '\t'
   print article_title.get_text()
   print '\t'
   x = str(commit.get_text())
   print x
```
python --version: Python 2.7.13
OS: debian 9
usage: python lkml.py
output: oh...
https://pastecode.xyz/view/04645424

Please ignore my print debug format.

This is my code and i can get text like output above.
So, simple my quzz:
I dont know how to delete strings after special word, for example:

```text
The registers rax, rcx and rdx are touched when controlling IBRS
so they need to be saved when they can't be clobbered.

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 45a63e0..3b9b238 100644
...
```
I want to delete string from *diff --git* to end, because too many code is here

Whatever, thanks!




--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor