[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2019-04-27 Thread Stefan Behnel


Stefan Behnel  added the comment:

Closing as a duplicate of the more general issue 18304.

--
resolution:  -> duplicate
stage:  -> resolved
status: open -> closed
superseder:  -> ElementTree -- provide a way to ignore namespace in tags and 
searches

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2015-02-10 Thread Martin Panter

Martin Panter added the comment:

Also Issue 18304 for more discussion on simplifying namespaces

--
nosy: +vadmium

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2012-07-21 Thread Florent Xicluna

Florent Xicluna florent.xicl...@gmail.com added the comment:

See also issue 13378 which proposes custom namespace maps for serializing.

--
components: +XML
nosy: +eli.bendersky
versions: +Python 3.4 -Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2011-05-28 Thread Stefan Behnel

Stefan Behnel sco...@users.sourceforge.net added the comment:

I don't see this having much to do with the DRY principle. It's explicit is 
better than implicit and better safe than sorry that applies here.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2011-05-28 Thread Martin v . Löwis

Martin v. Löwis mar...@v.loewis.de added the comment:

I recommend to revert this change. It seems that some users are opposed to any 
kind of folding (as my earlier folding experiment has demonstrated); users who 
*really* don't want to see the history would need to step forward and request a 
per-user option to suppress it.

I think the real solution to information flooding here is to have users split 
larger issues into smaller ones, only discuss one issue at the time, etc.

--
nosy: +loewis

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2011-05-28 Thread Martin v . Löwis

Changes by Martin v. Löwis mar...@v.loewis.de:


--
Removed message: http://bugs.python.org/msg137107

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2011-05-28 Thread library.engine

library.engine library.eng...@gmail.com added the comment:

What is so implicit in the passing of a list of undesired namespaces to the 
parse function?
This is quite explicit, in my humble opinion, and it lets you not to repeat 
yourself for each and every tag you want to find in the tree, as well.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2011-05-27 Thread library.engine

library.engine library.eng...@gmail.com added the comment:

I second request for tag names not prefixed with a root namespace in python,
mostly because of ugly code, as performance degradation is negligible on 
relatively small files. But this ubiquitous repeating (even in the case if 
you're appending a variable to every tag name) is just against the DRY 
principle, and I don't like it.
I think an extra option to pass list of namespaces that should NOT be prepended 
to the tag names would be sufficient.

--
nosy: +library.engine

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-08-04 Thread Terry J. Reedy

Changes by Terry J. Reedy tjre...@udel.edu:


--
type: performance - feature request
versions: +Python 3.2 -Python 2.5, Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-02 Thread Stefan Behnel

Stefan Behnel sco...@users.sourceforge.net added the comment:

There is at least one valid use case: code that needs to deal with HTML and 
XHTML currently has to normalise the tag names in some way, which usually means 
that it will want to remove the namespaces from XHTML documents to make it look 
like plain HTML. It would be nice if the library could do this efficiently 
right in the parser by simply removing all namespace declarations. However, 
this doesn't really apply to (c)ElementTree where the parser does not support 
HTML parsing.

I'm -1 on the interface that the proposed patch adds. The keyword argument name 
and its semantics are badly chosen. A boolean flag will work much better.

The proposed feature will have to be used with great care by users. Code that 
depends on it is very fragile and will fail when an input document uses 
unexpected namespaces, e.g. to embed foreign content, or because it is actually 
written in a different XML language that just happens to have similar local tag 
names. This kind of code is rather hard to fix, as fixing it means that it will 
stop accepting documents that previously passed without problems. Rejecting 
broken input early is a virtue.

All in all, I'm -0.5 on this feature as I'd expect most use cases to be 
premature optimisations with potentially dangerous side effects more than 
anything else.

--
nosy: +scoder

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-02 Thread Dmitry Chichkov

Dmitry Chichkov dchich...@gmail.com added the comment:

I agree that the argument name choice is poor. But it have already been made by 
whoever coded the EXPAT parser which cElementTree.XMLParser wraps. So there is 
not much room here.

As to 'proposed feature have to be used with great care by users' - this s 
already taken care of. If you look - cElementTree.XMLParser class is a rather 
obscure one. As I understand it is only being used by users requiring high 
performance xml parsing for large datasets (10GB - 10TB range) in data-mining 
applications.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-02 Thread Dmitry Chichkov

Dmitry Chichkov dchich...@gmail.com added the comment:

Interestingly in precisely these applications often you don't care about 
namespaces at all. Often all you need is to extract 'text' or 'name' elements 
irregardless of the namespace.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-01 Thread Fredrik Lundh

Fredrik Lundh fred...@effbot.org added the comment:

Namespaces are a fundamental part of the XML information model (both xpath and 
infoset) and all modern XML document formats, so I'm not sure what problem 
you're trying to solve by pretending that they don't exist.

It's a bit like modifying import foo to work like from foo import *...

--
nosy: +effbot

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-05-01 Thread Dmitry Chichkov

Dmitry Chichkov dchich...@gmail.com added the comment:

This patch does not modify the existing behavior of the library. The 
namespace_separator parameter is optional. Parameter already exists in the 
EXPAT library, but it is hard coded in the cElementTree.XMLParser code.

Fredrik, yes, namespaces are a fundamental part of the XML information model. 
Yet an option of having them ignored is a very valuable one in the performance 
critical code.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov

New submission from Dmitry Chichkov dchich...@gmail.com:

The namespace_separator parameter is hard coded in the cElementTree.XMLParser 
class disallowing the option of ignoring XML Namespaces with cElementTree 
library.

Here's the code example:
 from xml.etree.cElementTree import iterparse
 from StringIO import StringIO
 xml = root xmlns=http://www.very_long_url.com;child//root
 for event, elem in iterparse(StringIO(xml)): print event, elem

It produces:
 end Element '{http://www.very_long_url.com}child' at 0xb7ddfa58
 end Element '{http://www.very_long_url.com}root' at 0xb7ddfa40 

In the current implementation local tags get forcibly concatenated with URIs 
often resulting in the ugly code on the user's side and performance degradation 
(at least due to extra concatenations and extra lengthy compare operations in 
the elements matching code).

Internally cElementTree uses EXPAT parser, which is doing namespace processing 
only optionally, enabled by providing a value for namespace_separator argument. 
This argument is hard-coded in the cElementTree: 
 self-parser = EXPAT(ParserCreate_MM)(encoding, memory_handler, });

Well, attached is a patch exposing this parameter in the 
cElementTree.XMLParser() arguments. This parameter is optional and the default 
behavior should be unchanged.  Here's the test code:

import cElementTree

x = root xmlns=http://www.very_long_url.com;childtext/child/root

parser = cElementTree.XMLParser()
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator=})
parser.feed(x)
elem = parser.close()
print elem

parser = cElementTree.XMLParser(namespace_separator=None)
parser.feed(x)
elem = parser.close()
print elem

The resulting output:
Element '{http://www.very_long_url.com}root' at 0xb7e885f0
Element '{http://www.very_long_url.com}root' at 0xb7e88608
Element 'root' at 0xb7e88458

--
components: Library (Lib)
messages: 104671
nosy: dmtr
priority: normal
severity: normal
status: open
title: Hardcoded namespace_separator in the cElementTree.XMLParser
type: performance
versions: Python 2.5, Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Brian Curtin

Changes by Brian Curtin cur...@acm.org:


--
nosy: +flox

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov

Changes by Dmitry Chichkov dchich...@gmail.com:


--
keywords: +patch
Added file: http://bugs.python.org/file17153/issue-8583.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8583] Hardcoded namespace_separator in the cElementTree.XMLParser

2010-04-30 Thread Dmitry Chichkov

Dmitry Chichkov dchich...@gmail.com added the comment:

And obviously iterparse can be either overridden in the local user code or 
patched in the library. Here's the iterparse code/test code:

import  cElementTree
from cStringIO import StringIO

class iterparse(object):
root = None
def __init__(self, file, events=None, namespace_separator = }):
if not hasattr(file, 'read'):
file = open(file, 'rb')
self._file = file
self._events = events
self._namespace_separator = namespace_separator
def __iter__(self):
events = []
b = cElementTree.TreeBuilder()
p = cElementTree.XMLParser(b, namespace_separator= \
self._namespace_separator)
p._setevents(events, self._events)
while 1:
  data = self._file.read(16384)
  if not data:
break
  p.feed(data)
  for event in events:
yield event
  del events[:]
root = p.close()
for event in events:
  yield event
self.root = root


x = root xmlns=http://www.very_long_url.com;childtext/child/root
context = iterparse(StringIO(x), events=(start, end, start-ns))
for event, elem in context: print event, elem

context = iterparse(StringIO(x), events=(start, end, start-ns), 
namespace_separator = None)
for event, elem in context: print event, elem


It produces:
start-ns ('', 'http://www.very_long_url.com')
start Element '{http://www.very_long_url.com}root' at 0xb7ccf650
start Element '{http://www.very_long_url.com}child' at 0xb7ccf5a8
end Element '{http://www.very_long_url.com}child' at 0xb7ccf5a8
end Element '{http://www.very_long_url.com}root' at 0xb7ccf650
start Element 'root' at 0xb7ccf620
start Element 'child' at 0xb7ccf458
end Element 'child' at 0xb7ccf458
end Element 'root' at 0xb7ccf620

Note the absence of URIs and ignored start-ns events in the 'space_separator = 
None' version.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue8583
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com