Hello, Here is the Python code.
The code works as expected most of the time. Regards, William Johnston From: Slide Sent: Wednesday, December 16, 2015 6:19 PM To: William Johnston ; ironpython-users@python.org Subject: Re: [Ironpython-users] debug C# IronPython app? Can you give the code for the tagger method? On Wed, Dec 16, 2015 at 4:13 PM William Johnston <willi...@tenbase2.com> wrote: Hello, Here it is: private dynamic textrazortagger = runtime.UseFile(@"E:\Users\William Johnston\Documents\Visual Studio 2010\Projects\Visual Studio Projects\TextRazor\pos.py"); Sincerely, William Johnston From: Slide Sent: Wednesday, December 16, 2015 5:37 PM To: William Johnston ; ironpython-users@python.org Subject: Re: [Ironpython-users] debug C# IronPython app? Forgive my ignorance, but what is textrazortagger? On Wed, Dec 16, 2015 at 2:23 PM William Johnston <willi...@tenbase2.com> wrote: Hello, Here is my code: public List<MyPythonTuple> TextRazorTagger(string str) { List<MyPythonTuple> ret = new List<MyPythonTuple>(); try { IronPython.Runtime.List results = textrazortagger.tagger(str); foreach (IronPython.Runtime.PythonTuple tuple in results) { string strWord = (string)tuple[0]; string strPos = (string)tuple[1]; if (strPos.Length == 0) { continue; } MyPythonTuple myResult = new MyPythonTuple(); myResult.Word = strWord; myResult.Pos = strPos; ret.Add(myResult); } } catch (Exception ex) { throw new Exception(ex.Message); } return ret; } tuple1 returns an empty string. Thanks, William Johnston From: Slide Sent: Tuesday, December 15, 2015 5:13 PM To: William Johnston ; ironpython-users@python.org Subject: Re: [Ironpython-users] debug C# IronPython app? Microsoft isn't really involved with IronPython anymore, it's a completely open source project with no developers from MS really spending time on it (Dino does help out). Providing some code might allow someone on this list to help out. slide On Tue, Dec 15, 2015 at 1:53 PM William Johnston <willi...@tenbase2.com> wrote: Hello, A C# DLR app is not returning results correctly. (A part of speech tagger is returning an empty string for the actual pos for certain strings.) (The second PythonTuple value from an IronPython List is empty.) Howver, the Python script does run from a Python Shell. How would I go about debugging the app? Do you know if Microsoft provides paid support? Thanks. Regards, William Johnston _______________________________________________ Ironpython-users mailing list Ironpython-users@python.org https://mail.python.org/mailman/listinfo/ironpython-users _______________________________________________ Ironpython-users mailing list Ironpython-users@python.org https://mail.python.org/mailman/listinfo/ironpython-users _______________________________________________ Ironpython-users mailing list Ironpython-users@python.org https://mail.python.org/mailman/listinfo/ironpython-users
from textrazor import TextRazor # basic instantiation. TODO Put your authentication keys here. client = TextRazor("", extractors=["words"]) def tagger(text) : response = client.analyze(text) words = response.words() seen = [] for word in words: print word.token, word.part_of_speech seen.append((word.token, word.part_of_speech)) return seen
""" Copyright (c) 2014 TextRazor, http://textrazor.com/ Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. """ try: from urllib2 import Request, urlopen, HTTPError, URLError from urllib import urlencode except ImportError: from urllib.request import Request, urlopen from urllib.parse import urlencode from urllib.error import HTTPError, URLError import re try: import simplejson as json except ImportError: import json import cStringIO as IOStream import gzip class TextRazorAnalysisException(BaseException): pass class Topic(object): """Represents a single abstract topic extracted from the input text. Requires the "topics" extractor to be added to the TextRazor request. """ def __init__(self, topic_json, link_index): self._topic_json = topic_json for callback, arg in link_index.get(("topic", self.id), []): callback(arg, self) @property def id(self): """The unique id of this annotation within its annotation set. """ return self._topic_json.get("id", None) @property def label(self): """Returns the label for this topic.""" return self._topic_json.get("label", "") @property def wikipedia_link(self): """Returns a link to Wikipedia for this topic, or None if this topic couldn't be linked to a wikipedia page.""" return self._topic_json.get("wikiLink", None) @property def score(self): """Returns the relevancy score of this topic to the query document.""" return self._topic_json.get("score", 0) def __repr__(self): return "TextRazor Topic %s with label %s" % (str(self.id), str(self.label)) def __str__(self): out = ["TextRazor Topic %s and label %s:" % (str(self.id), str(self.label)), "\n"] for property in dir(self): if not property.startswith("_") and not property == "id": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Entity(object): """Represents a single "Named Entity" extracted from the input text. Requires the "entities" extractor to be added to the TextRazor request. """ def __init__(self, entity_json, link_index): self._response_entity = entity_json self._matched_words = [] for callback, arg in link_index.get(("entity", self.document_id), []): callback(arg, self) for position in self.matched_positions: try: link_index[("word", position)].append((self._register_link, None)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, None)] def _register_link(self, dummy, word): self._matched_words.append(word) word._add_entity(self) @property def document_id(self): return self._response_entity.get("id", None) @property def id(self): """Returns the disambiguated ID for this entity, or None if this entity could not be disambiguated. """ return self._response_entity.get("entityId", None) @property def freebase_id(self): """Returns the disambiguated Freebase ID for this entity, or None if either this entity could not be disambiguated, or a Freebase link doesn't exist.""" return self._response_entity.get("freebaseId", None) @property def wikipedia_link(self): """Returns a link to Wikipedia for this entity, or None if either this entity could not be disambiguated or a Wikipedia link doesn't exist.""" return self._response_entity.get("wikiLink", None) @property def matched_text(self): """Returns the source text string that matched this entity.""" return self._response_entity.get("matchedText", None) @property def starting_position(self): return self._response_entity.get("startingPos", None) @property def ending_position(self): return self._response_entity.get("endingPos", None) @property def matched_positions(self): """Returns a list of the token positions in the current sentence that make up this entity.""" return self._response_entity.get("matchingTokens", []) @property def matched_words(self): """Returns a list of :class:`Word` that make up this entity.""" return self._matched_words @property def freebase_types(self): """Returns a list of Freebase types for this entity, or an empty list if there are none.""" return self._response_entity.get("freebaseTypes", []) @property def relevance_score(self): """Returns the relevance this entity has to the source text. This is a float on a scale of 0 to 1, with 1 being the most relevant. Relevance is determined by the contextual similarity between the entities context and facts in the TextRazor knowledgebase.""" return self._response_entity.get("relevanceScore", None) @property def confidence_score(self): """Returns the confidence that TextRazor is correct that this is a valid entity. TextRazor uses an ever increasing number of signals to help spot valid entities, all of which contribute to this score. These include the contextual agreement between the words in the source text and our knowledgebase, agreement between other entities in the text, agreement between the expected entity type and context, prior probabilities of having seen this entity across wikipedia and other web datasets. The score ranges from 0.5 to 10, with 10 representing the highest confidence that this is a valid entity.""" return self._response_entity.get("confidenceScore", None) @property def dbpedia_types(self): """Returns a list of dbpedia types for this entity, or an empty list if there are none.""" return self._response_entity.get("type", []) @property def data(self): """ Returns a dictionary containing enriched data found for this entity. """ return self._response_entity.get("data", {}) def __repr__(self): return "TextRazor Entity %s at positions %s" % (self.id.encode("utf-8"), str(self.matched_positions)) def __str__(self): out = ["TextRazor Entity with id:", self.id.encode("utf-8"), "\n"] for property in dir(self): if not property.startswith("_") and not property == "id": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Entailment(object): """Represents a single "entailment" derived from the source text. Requires the "entailments" extractor to be added to the TextRazor request. """ def __init__(self, entailment_json, link_index): self.entailment_json = entailment_json self._matched_words = [] for callback, arg in link_index.get(("entailment", self.id), []): callback(arg, self) for position in self.matched_positions: try: link_index[("word", position)].append((self._register_link, None)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, None)] def _register_link(self, dummy, word): self._matched_words.append(word) word._add_entailment(self) @property def matched_positions(self): """Returns the token positions in the current sentence that generated this entailment.""" return self.entailment_json.get("wordPositions", []) @property def matched_words(self): """Returns links the :class:`Word` in the current sentence that generated this entailment.""" return self._matched_words @property def id(self): """The unique id of this annotation within its annotation set. """ return self.entailment_json.get("id", None) @property def prior_score(self): """Returns the score of this entailment independent of the context it is used in this sentence.""" return self.entailment_json.get("priorScore", None) @property def context_score(self): """Returns the score of agreement between the source word's usage in this sentence and the entailed words usage in our knowledgebase.""" return self.entailment_json.get("contextScore", None) @property def score(self): """Returns the overall confidence that TextRazor is correct that this is a valid entailment, a combination of the prior and context score.""" return self.entailment_json.get("score", None) @property def entailed_word(self): """Returns the word string that is entailed by the source words.""" entailed_tree = self.entailment_json.get("entailedTree", None) if entailed_tree: return entailed_tree.get("word", None) def __repr__(self): return "TextRazor Entailment:\"%s\" at positions %s" % (str(self.entailed_word), str(self.matched_positions)) def __str__(self): out = ["TextRazor Entailment:", str(self.entailed_word), "\n"] for property in dir(self): if not property.startswith("_") and not property == "id": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class RelationParam(object): """Represents a Param to a specific :class:`Relation`. Requires the "relations" extractor to be added to the TextRazor request.""" def __init__(self, param_json, relation_parent, link_index): self._param_json = param_json self._relation_parent = relation_parent self._param_words = [] for position in self.param_positions: try: link_index[("word", position)].append((self._register_link, None)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, None)] def _register_link(self, dummy, word): self._param_words.append(word) word._add_relation_param(self) @property def relation_parent(self): """Returns the :class:`Relation` that owns this param.""" return self._relation_parent @property def relation(self): """Returns the relation of this param to the predicate: Possible values: SUBJECT, OBJECT, OTHER""" return self._param_json.get("relation", None) @property def param_positions(self): """Returns a list of the positions of the words in this param within their sentence.""" return self._param_json.get("wordPositions", []) @property def param_words(self): """Returns a list of all the :class:`Word` that make up this param.""" return self._param_words def entities(self): """Returns a generator of all :class:`Entity` mentioned in this param.""" seen = set() for word in self.param_words: for entity in word.entities: if entity not in seen: seen.add(entity) yield entity def __repr__(self): return "TextRazor RelationParam:\"%s\" at positions %s" % (str(self.relation), str(self.param_words)) def __str__(self): return repr(self) class NounPhrase(object): """Represents a multi-word phrase extracted from a sentence. Requires the "relations" extractor to be added to the TextRazor request.""" def __init__(self, noun_phrase_json, link_index): self._noun_phrase_json = noun_phrase_json self._words = [] for callback, arg in link_index.get(("nounPhrase", self.id), []): callback(arg, self) for position in self.word_positions: try: link_index[("word", position)].append((self._register_link, None)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, None)] def _register_link(self, dummy, word): self._words.append(word) word._add_noun_phrase(self) @property def id(self): """The unique id of this annotation within its annotation set. """ return self._noun_phrase_json.get("id", None) @property def word_positions(self): """Returns a list of the positions of the words in this phrase.""" return self._noun_phrase_json.get("wordPositions", []) @property def words(self): """Returns a list of :class:`Word` that make up this phrase.""" return self._words def __repr__(self): return "TextRazor NounPhrase at positions %s" % (str(self.words)) def __str__(self): out = ["TextRazor NounPhrase:", str(self.word_positions), "\n"] for property in dir(self): if not property.startswith("_") and not property == "word_positions": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Property(object): """Represents a property relation extracted from raw text. A property implies an "is-a" or "has-a" relationship between the predicate (or focus) and its property. Requires the "relations" extractor to be added to the TextRazor request. """ def __init__(self, property_json, link_index): self._property_json = property_json self._predicate_words = [] self._property_words = [] for callback, arg in link_index.get(("property", self.id), []): callback(arg, self) for position in self.predicate_positions: try: link_index[("word", position)].append((self._register_link, True)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, True)] for position in self.property_positions: try: link_index[("word", position)].append((self._register_link, False)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, False)] def _register_link(self, is_predicate, word): if is_predicate: self._predicate_words.append(word) word._add_property_predicate(self) else: self._property_words.append(word) word._add_property_properties(self) @property def id(self): """The unique id of this annotation within its annotation set. """ return self._property_json.get("id", None) @property def predicate_positions(self): """Returns a list of the positions of the words in the predicate (or focus) of this property.""" return self._property_json.get("wordPositions", []) @property def predicate_words(self): """Returns a list of TextRazor words that make up the predicate (or focus) of this property.""" return self._predicate_words @property def property_positions(self): """Returns a list of word positions that make up the modifier of the predicate of this property.""" return self._property_json.get("propertyPositions", []) @property def property_words(self): """Returns a list of :class:`Word` that make up the property that targets the focus words.""" return self._property_words def __repr__(self): return "TextRazor Property at positions %s" % (str(self.predicate_positions)) def __str__(self): out = ["TextRazor Property:", str(self.predicate_positions), "\n"] for property in dir(self): if not property.startswith("_") and not property == "predicate_positions": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Relation(object): """Represents a grammatical relation between words. Typically owns a number of :class:`RelationParam`, representing the SUBJECT and OBJECT of the relation. Requires the "relations" extractor to be added to the TextRazor request.""" def __init__(self, relation_json, link_index): self._relation_json = relation_json self._params = [RelationParam(param, self, link_index) for param in relation_json["params"]] self._predicate_words = [] for callback, arg in link_index.get(("relation", self.id), []): callback(arg, self) for position in self.predicate_positions: try: link_index[("word", position)].append((self._register_link, None)) except KeyError as ex: link_index[("word", position)] = [(self._register_link, None)] def _register_link(self, dummy, word): self._predicate_words.append(word) word._add_relation(self) @property def id(self): """The unique id of this annotation within its annotation set. """ return self._relation_json.get("id", None) @property def predicate_positions(self): """Returns a list of the positions of the predicate words in this relation within their sentence.""" return self._relation_json.get("wordPositions", []) @property def predicate_words(self): """Returns a list of the TextRazor words in this relation.""" return self._predicate_words @property def params(self): """Returns a list of the TextRazor params of this relation.""" return self._params def __repr__(self): return "TextRazor Relation at positions %s" % (str(self.predicate_words)) def __str__(self): out = ["TextRazor Relation:", str(self.predicate_words), "\n"] for property in dir(self): if not property.startswith("_") and not property == "predicate_positions": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Word(object): """Represents a single Word (token) extracted by TextRazor. Requires the "words" extractor to be added to the TextRazor request.""" def __init__(self, response_word, link_index): self._response_word = response_word self._parent = None self._children = [] self._entities = [] self._entailments = [] self._relations = [] self._relation_params = [] self._property_predicates = [] self._property_properties = [] self._noun_phrases = [] for callback, arg in link_index.get(("word", self.position), []): callback(arg, self) def _add_child(self, child): self._children.append(child) def _set_parent(self, parent): self._parent = parent parent._add_child(self) def _add_entity(self, entity): self._entities.append(entity) def _add_entailment(self, entailment): self._entailments.append(entailment) def _add_relation(self, relation): self._relations.append(relation) def _add_relation_param(self, relation_param): self._relation_params.append(relation_param) def _add_property_predicate(self, property): self._property_predicates.append(property) def _add_property_properties(self, property): self._property_properties.append(property) def _add_noun_phrase(self, noun_phrase): self._noun_phrases.append(noun_phrase) @property def parent_position(self): """Returns the position of the grammatical parent of this word, or None if this word is either at the root of the sentence or the "dependency-trees" extractor was not requested.""" return self._response_word.get("parentPosition", None) @property def parent(self): """Returns a link to the TextRazor word that is parent of this word, or None if this word is either at the root of the sentence or the "dependency-trees" extractor was not requested.""" return self._parent @property def relation_to_parent(self): """Returns the Grammatical relation between this word and it's parent, or None if this word is either at the root of the sentence or the "dependency-trees" extractor was not requested. TextRazor parses into the Stanford uncollapsed dependencies, as detailed at: http://nlp.stanford.edu/software/dependencies_manual.pdf """ return self._response_word.get("relationToParent", None) @property def children(self): """Returns a list of TextRazor words that make up the children of this word. Returns an empty list for leaf words, or if the "dependency-trees" extractor was not requested.""" return self._children @property def position(self): """Returns the position of this word in its sentence.""" return self._response_word.get("position", None) @property def stem(self): """Returns the stem of this word""" return self._response_word.get("stem", None) @property def lemma(self): """Returns the morphological root of this word, see http://en.wikipedia.org/wiki/Lemma_(morphology) for details.""" return self._response_word.get("lemma", None) @property def token(self): """Returns the raw token string that matched this word in the source text.""" return self._response_word.get("token", None) @property def part_of_speech(self): """Returns the Part of Speech that applies to this word. We use the Penn treebank tagset, as detailed here: http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html""" return self._response_word.get("partOfSpeech", None) @property def input_start_offset(self): """Returns the start offset in the input text for this token. Note that this offset applies to the original Unicode string passed in to the api, TextRazor treats multi byte utf8 charaters as a single position.""" return self._response_word.get("startingPos", None) @property def input_end_offset(self): """Returns the end offset in the input text for this token. Note that this offset applies to the original Unicode string passed in to the api, TextRazor treats multi byte utf8 charaters as a single position.""" return self._response_word.get("endingPos", None) @property def entailments(self): """Returns a list of :class:`Entailment` that this word entails.""" return self._entailments @property def entities(self): """Returns a list of :class:`Entity` that this word is a part of.""" return self._entities @property def relations(self): """Returns a list of :class:`Relation` that this word is a predicate of.""" return self._relations @property def relation_params(self): """Returns a list of :class:`RelationParam` that this word is a member of.""" return self._relation_params @property def property_properties(self): """Returns a list of :class:`Property` that this word is a property member of.""" return self._property_properties @property def property_predicates(self): """Returns a list of :class:`Property` that this word is a predicate (or focus) member of.""" return self._property_predicates @property def noun_phrases(self): """Returns a list of :class:`NounPhrase` that this word is a member of.""" return self._noun_phrases @property def senses(self): """Returns a list of (sense, score) tuples representing scores of each Wordnet sense this this word may be a part of.""" return self._response_word.get("senses", []) def __repr__(self): return "TextRazor Word:\"%s\" at position %s" % ((self.token).encode("utf-8"), str(self.position)) def __str__(self): out = ["TextRazor Word:", str(self.token.encode("utf-8")), "\n"] for property in dir(self): if not property.startswith("_") and not property == "token": out.extend([property, ":", repr(getattr(self, property)), "\n"]) return " ".join(out) class Sentence(object): """Represents a single sentence extracted by TextRazor.""" def __init__(self, sentence_json, link_index): if "words" in sentence_json: self._words = [Word(word_json, link_index) for word_json in sentence_json["words"]] else: self._words = [] self._add_links(link_index) def _add_links(self, link_index): if not self._words: return self._root_word = None # Add links between the parent/children of the dependency tree in this sentence. word_positions = {} for word in self._words: word_positions[word.position] = word for word in self._words: parent_position = word.parent_position if None != parent_position and parent_position >= 0: word._set_parent(word_positions[parent_position]) else: # Punctuation does not get attached to any parent, any non punctuation part of speech # must be the root word. if word.part_of_speech not in ("$", "``", "''", "(", ")", ",", "--", ".", ":"): self._root_word = word @property def root_word(self): """Returns the root word of this sentence if "dependency-trees" extractor was requested.""" return self._root_word @property def words(self): """Returns a list of all the :class:`Word` in this sentence.""" return self._words class CustomAnnotation(object): def __init__(self, annotation_json, link_index): self._annotation_json = annotation_json for key_value in annotation_json.get("contents", []): for link in key_value.get("links", []): try: link_index[(link["annotationName"], link["linkedId"])].append((self._register_link, link)) except Exception as ex: link_index[(link["annotationName"], link["linkedId"])] = [(self._register_link, link)] def _register_link(self, link, annotation): link["linked"] = annotation new_custom_annotation_list = [] try: new_custom_annotation_list = getattr(annotation, self.name()); except Exception as ex: pass new_custom_annotation_list.append(self) setattr(annotation, self.name(), new_custom_annotation_list) def name(self): return self._annotation_json["name"] def __getattr__(self, attr): exists = False for key_value in self._annotation_json["contents"]: if "key" in key_value and key_value["key"] == attr: exists = True for link in key_value.get("links", []): try: yield link["linked"] except Exception as ex: yield link for int_value in key_value.get("intValue", []): yield int_value for float_value in key_value.get("floatValue", []): yield float_value for str_value in key_value.get("stringValue", []): yield str_value for bytes_value in key_value.get("bytesValue", []): yield bytes_value if not exists: raise AttributeError("%r annotation has no attribute %r" % (self.name(), attr)) def __repr__(self): return "TextRazor CustomAnnotation:\"%s\"" % (self._annotation_json["name"]) def __str__(self): out = ["TextRazor CustomAnnotation:", str(self._annotation_json["name"]), "\n"] for key_value in self._annotation_json["contents"]: try: out.append("Param %s:" % key_value["key"]) except Exception as ex: out.append("Param (unlabelled):") out.append("\n") for link in self.__getattr__(key_value["key"]): out.append(repr(link)) out.append("\n") return " ".join(out) class TextRazorResponse(object): """Represents a processed response from TextRazor.""" def __init__(self, response_json): self.response_json = response_json self.sentences = [] self.custom_annotations = [] link_index = {} if "response" in self.response_json: # There's a bit of magic here. Each annotation registers a callback with the ids and types of annotation # that it is linked to. When the linked annotation is later parsed it adds the link via the callback. # This means that annotations must be added in order of the dependency between them. if "customAnnotations" in self.response_json["response"]: self.custom_annotations = [CustomAnnotation(json, link_index) for json in self.response_json["response"]["customAnnotations"]] if "topics" in self.response_json["response"]: self._topics = [Topic(topic_json, link_index) for topic_json in self.response_json["response"]["topics"]] if "coarseTopics" in self.response_json["response"]: self._coarse_topics = [Topic(topic_json, link_index) for topic_json in self.response_json["response"]["coarseTopics"]] if "entities" in self.response_json["response"]: self._entities = [Entity(entity_json, link_index) for entity_json in self.response_json["response"]["entities"]] else: self._entities = [] if "entailments" in self.response_json["response"]: self._entailments = [Entailment(entailment_json, link_index) for entailment_json in self.response_json["response"]["entailments"]] else: self._entailments = [] if "relations" in self.response_json["response"]: self._relations = [Relation(relation_json, link_index) for relation_json in self.response_json["response"]["relations"]] else: self._relations = [] if "properties" in self.response_json["response"]: self._properties = [Property(property_json, link_index) for property_json in self.response_json["response"]["properties"]] else: self._properties = [] if "nounPhrases" in self.response_json["response"]: self._noun_phrases = [NounPhrase(phrase_json, link_index) for phrase_json in self.response_json["response"]["nounPhrases"]] else: self._noun_phrases = [] if "sentences" in self.response_json["response"]: self.sentences = [Sentence(sentence_json, link_index) for sentence_json in self.response_json["response"]["sentences"]] @property def cleaned_text(self): return self.response_json["response"].get("cleanedText", "") def summary(self): return """Request processed in: %s seconds. Num Sentences:%s""" % \ (self.response_json["time"], len(self.response_json["response"]["sentences"])) def custom_annotation_output(self): """Returns any output generated while running the embedded prolog engine on your rules.""" return self.response_json["response"].get("customAnnotationOutput", "") def coarse_topics(self): """Returns a list of all the coarse :class:`Topic` in the response. """ return self._coarse_topics def topics(self): """Returns a list of all the :class:`Topic` in the response. """ return self._topics def entities(self): """Returns a list of all the :class:`Entity` across all sentences in the response.""" return self._entities def words(self): """Returns a generator of all :class:`Word` across all sentences in the response.""" for sentence in self.sentences: for word in sentence.words: yield word def entailments(self): """Returns a list of all :class:`Entailment` across all sentences in the response.""" return self._entailments def relations(self): """Returns a list of all :class:`Relation` across all sentences in the response.""" return self._relations def properties(self): """Returns a list of all :class:`Property` across all sentences in the response.""" return self._properties def noun_phrases(self): """Returns a list of all the :class:`NounPhrase` across all sentences in the response.""" return self._noun_phrases def sentences(self): """Returns a list of all :class:`Sentence` in the response.""" return self.sentences def matching_rules(self): return [custom_annotation.name() for custom_annotation in self.custom_annotations] def __getattr__(self, attr): exists = False for custom_annotation in self.custom_annotations: if custom_annotation.name() == attr: exists = True yield custom_annotation if not exists: raise AttributeError("TextRazor response has no annotation %r" % attr) class TextRazor(object): """ The main TextRazor client. To process your text, create a :class:`TextRazor` instance with your API key and set the extractors you need to process the text. Calls to :meth:`analyze` and :meth:`analyze_url` will then process raw text or URLs , returning a :class:`TextRazorResponse` on success. This class is threadsafe once initialized with the request options. You should create a new instance for each request if you are likely to be changing the request options in a multithreaded environment. Below is an entity extraction example from the tutorial, you can find more examples at http://www.textrazor.com/tutorials. >>> client = TextRazor(api_key="DEMO", extractors=["entities"]) >>> client.set_do_cleanup_HTML(True) >>> >>> response = client.analyze_url("http://www.bbc.co.uk/news/uk-politics-18640916") >>> >>> entities = list(response.entities()) >>> entities.sort(key=lambda x: x.relevance_score, reverse=True) >>> >>> seen = set() >>> for entity in entities: >>> if entity.id not in seen: >>> print entity.id, entity.relevance_score, entity.confidence_score, entity.freebase_types >>> seen.add(entity.id) """ _SECURE_TEXTRAZOR_ENDPOINT = "https://api.textrazor.com/" _TEXTRAZOR_ENDPOINT = "http://api.textrazor.com/" def __init__(self, api_key, extractors, do_compression=True, do_encryption=False): self.api_key = api_key self.extractors = extractors self.do_compression = do_compression self.do_encryption = do_encryption self.cleanup_html = False self.rules = "" self.language_override = None self.enrichment_queries = [] self.dbpedia_type_filters = [] self.freebase_type_filters = [] self.allow_overlap = None def set_api_key(self, api_key): """Sets the TextRazor API key, required for all requests.""" self.api_key = api_key def set_extractors(self, extractors): """Sets a list of "Extractors" which extract various information from your text. Only select the extractors that are explicitly required by your application for optimal performance. Any extractor that doesn't match one of the predefined list below will be assumed to be a custom Prolog extractor. Valid options are: words, phrases, entities, dependency-trees, relations, entailments. """ self.extractors = extractors def set_rules(self, rules): """Sets a string containing Prolog logic. All rules matching an extractor name listed in the request will be evaluated and all matching param combinations linked in the response. """ self.rules = rules def set_do_compression(self, do_compression): """When True, request gzipped responses from TextRazor. When expecting a large response this can significantly reduce bandwidth. Defaults to True.""" self.do_compression = do_compression def set_do_encryption(self, do_encryption): """When True, all communication to TextRazor will be sent over SSL, when handling sensitive or private information this should be set to True. Defaults to False.""" self.do_encryption = do_encryption def set_enrichment_queries(self, enrichment_queries): """Set a list of "Enrichment Queries", used to enrich the entity response with structured linked data. The syntax for these queries is documented at https://www.textrazor.com/enrichment """ self.enrichment_queries = enrichment_queries def set_language_override(self, language_override): self.language_override = language_override def set_do_cleanup_HTML(self, cleanup_html): """When True, input text is treated as raw HTML and will be cleaned of tags, comments, scripts, and boilerplate content removed. When this option is enabled, the cleaned_text property is returned with the text content, providing access to the raw filtered text. When enabled, position offsets returned in individual words apply to the clean text, not the provided HTML.""" self.cleanup_html = cleanup_html def set_entity_allow_overlap(self, allow_overlap): """When allow_overlap is True, entities in the response may overlap. When False, the "best" entity is found such that none overlap. Defaults to True. """ self.allow_overlap = allow_overlap def set_entity_dbpedia_type_filters(self, filters): """Set a list of DBPedia types to filter entity extraction on. All returned entities must match at least one of these types.""" self.dbpedia_type_filters = filters def set_entity_freebase_type_filters(self, filters): """Set a list of Freebase types to filter entity extraction on. All returned entities must match at least one of these types.""" self.freebase_type_filters = filters def analyze_url(self, url, headers={}): """Given a url and optional dict of HTTP headers, first downloads the URL then processes the resulting text. If you expect HTML in the response, you may want to set :meth:`set_do_cleanup_HTML` to true to filter unwanted HTML content. Returns a :class:`TextRazorResponse` with the parsed data on success. Raises a :class:`TextRazorAnalysisException` on failure. """ req = Request(url, headers=headers) response = urlopen(req) text = response.read().decode("utf-8", "ignore") return self.analyze(text) def analyze(self, text): """Calls the TextRazor API with the provided unicode text. Returns a :class:`TextRazorResponse` with the parsed data on success. Raises a :class:`TextRazorAnalysisException` on failure. """ post_data = [("text", text.encode("utf-8")), ("apiKey", self.api_key), ("rules", self.rules), ("extractors", ",".join(self.extractors)), ("cleanupHTML", self.cleanup_html)] for filter in self.dbpedia_type_filters: post_data.append(("entities.filterDbpediaTypes", filter)) for filter in self.freebase_type_filters: post_data.append(("entities.filterFreebaseTypes", filter)) for query in self.enrichment_queries: post_data.append(("entities.enrichmentQueries", query)) if self.language_override != None: post_data.append(("languageOverride", self.language_override)) if self.allow_overlap != None: post_data.append(("entities.allowOverlap", self.allow_overlap)) encoded_post_data = urlencode(post_data) request_headers = {} if self.do_compression: request_headers['Accept-encoding'] = 'gzip' if self.do_encryption: request = Request(self._SECURE_TEXTRAZOR_ENDPOINT, headers=request_headers, data=encoded_post_data.encode("utf-8")) else: request = Request(self._TEXTRAZOR_ENDPOINT, headers=request_headers, data=encoded_post_data.encode("utf-8")) try: response = urlopen(request) except HTTPError as e: raise TextRazorAnalysisException("TextRazor returned HTTP Code %d: %s" % (e.code, e.read())) except URLError as e: raise TextRazorAnalysisException("Could not connect to TextRazor") if response.info().get('Content-Encoding') == 'gzip': buf = IOStream.StringIO( response.read()) response = gzip.GzipFile(fileobj=buf) response_json = json.loads(response.read().decode("utf-8")) return TextRazorResponse(response_json)
_______________________________________________ Ironpython-users mailing list Ironpython-users@python.org https://mail.python.org/mailman/listinfo/ironpython-users