On Sat, Jul 6, 2013 at 5:02 PM, Denis Papathanasiou
<denis.papathanas...@gmail.com> wrote:
> On Saturday, July 6, 2013 1:22:32 PM UTC-4, Lars Nilsson wrote:
>>
>> [snip]
>>
>> If that kind of splitting is really all you require,
>> (clojure.string/split my-text #"[.!?;]") or (re-seq #"[^.!?;]+"
>> my-text)
> Is there any way to preserve the actual punctuation? That's why I was
> looking at partition-by and group-by instead.

You could try (re-seq #"[^.!?;]+[.!?;]?" my-text) or perhaps Jim's
longer regex is better suited (I didn't look at it in-depth, but it is
longer... :) )

>> For fancier stuff look into an opennlp wrapper or something like it.
>>
>> https://github.com/dakrone/clojure-opennlp
>
>
> This might be a better solution; thanks for mentioning it.

It is certainly what I would use, if I was looking for decent text
parsing and I was interested more in the use of the output than the
implementation of tokenization, etc.

Lars Nilsson

-- 
-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
"Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to